In [77]:
'''
Final Project Tutorial

Joed Quaye
Ronald Chomnou
Mark Spooner
Griffin Araujo
'''
# necessary imports

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import requests
import numpy as np
from functools import reduce
import pandas as pd
import matplotlib.pyplot as plt
import re
from  sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import seaborn as sns
import statsmodels.api as sm


# looking at the past 6 seasons for data analysis
URL2324 = "https://www.basketball-reference.com/leagues/NBA_2024_per_game.html"
URL2223 = "https://www.basketball-reference.com/leagues/NBA_2023_per_game.html"
URL2122 = "https://www.basketball-reference.com/leagues/NBA_2022_per_game.html"
URL2021 = "https://www.basketball-reference.com/leagues/NBA_2021_per_game.html"
URL1920 = "https://www.basketball-reference.com/leagues/NBA_2020_per_game.html"
URL1819 = "https://www.basketball-reference.com/leagues/NBA_2019_per_game.html"

URLMIP1920 = "https://www.basketball-reference.com/awards/awards_2020.html"
URLMIP2021 = "https://www.basketball-reference.com/awards/awards_2021.html"
URLMIP2122 = "https://www.basketball-reference.com/awards/awards_2022.html"
URLMIP2223 = "https://www.basketball-reference.com/awards/awards_2023.html"
URLMIP2324 = "https://www.basketball-reference.com/awards/awards_2024.html"

# URL1718 = "https://www.basketball-reference.com/leagues/NBA_2018_per_game.html"
                                                                                Introduction

The primary goal of this project is to guide you through the comprehensive process of data analysis within the context of basketball performance metrics. Our focus will be on evaluating the performance data of basketball players over the past six years, specifically to identify trends and correlations that can help predict the Most Improved Player (MIP) in future seasons. The MIP award is given annually to the player who has shown the most significant improvement in their performance, making it an intriguing subject for data-driven analysis.

Why is this important? Recognizing and predicting the Most Improved Player can provide valuable insights into player development, scouting, and team strategy. Understanding the factors that contribute to a player's improvement can help teams invest in potential stars early, optimize training programs, and enhance overall team performance. Moreover, fans and analysts alike can gain a deeper appreciation for the game's dynamics and the players' growth trajectories.

Throughout this project, we will utilize a variety of data science techniques to analyze player statistics, performance metrics, and other relevant data points. By examining past MIP winners and comparing their performance data to other players, we aim to uncover patterns and predictive indicators. Our analysis will focus on several key aspects:

Data Collection: Gathering comprehensive player data from the past six years, including points per game, offensive and defensive rebounds, assists, blocks, and steals. Data Cleaning and Preparation: Ensuring the data is accurate, complete, and formatted for analysis. Exploratory Data Analysis (EDA): Visualizing and summarizing the data to identify trends, anomalies, and initial insights. Feature Engineering: Creating new variables and metrics that might be significant predictors of improvement. Modeling and Prediction: Applying various statistical and machine learning models to predict future MIP candidates based on historical data. Evaluation and Interpretation: Assessing the performance of our models and interpreting the results to draw meaningful conclusions. By following this structured approach, we aim to provide a robust analysis that not only identifies potential future MIP candidates but also enhances our understanding of the factors driving player improvement in professional basketball.

                                                                    Data Collection

Our task is to gather the necessary datasets for our analysis. Our project aims to analyze basketball player performance data over the past six years and compare it to the Most Improved Player (MIP) of each year to identify trends and correlations that could help predict future MIP winners.

To achieve this, we need comprehensive performance data for each player from the last six seasons, as well as the MIP rankings for those seasons. We will use web scraping techniques to collect this data from reliable sports statistics websites. Python, along with libraries such as BeautifulSoup and Selenium, will be instrumental in this process.

Player Statistics Data Scraping

The following function takes a URL and returns a dictionary with the corresponding player data as output:

In [78]:
# function takes in a URL and returns a dictionary with corresponding data as output 
def data_scrape(URL):
    # new webdriver
    driver = webdriver.Safari()
    driver.get(URL)

    # reading the data as HTML
    html = BeautifulSoup(driver.page_source, 'html.parser')
    table = html.find('table', {'id': 'per_game_stats'})

    player_stats = {}
    temp = {} 
    usedTOT = False

    body = table.find('tbody')
    rows = body.find_all('tr')

    # some are duplicates
    for row in rows:
        try:
            # cells obtains all the column data of each row 
            cells = row.find_all('td')
            # for each row, get appropriate stat (according to column location) and append 
            # first instance of td is the names 
            player_name = cells[0].text.strip()

            # taking into account whether the player played for multiple teams 
            if cells[3].text == 'TOT': 
                temp[player_name] = {}
                temp[player_name]["position"] = cells[1].text.strip()
                temp[player_name]["age"] = cells[2].text.strip()
                temp[player_name]["games played"] = cells[4].text.strip()
                temp[player_name]["games started"] = cells[5].text.strip()
                temp[player_name]["minutes played per game"] = cells[6].text.strip()
                temp[player_name]["field goals"] = cells[7].text.strip()
                temp[player_name]["field goal attempts"] = cells[8].text.strip()
                temp[player_name]["fg percentage"] = cells[9].text.strip()
                temp[player_name]["3pt per game"] = cells[10].text.strip()
                temp[player_name]["3pt attempts"] = cells[11].text.strip()
                temp[player_name]["3pt percentage"] = cells[12].text.strip()
                temp[player_name]["2pt per game"] = cells[13].text.strip()
                temp[player_name]["2pt attempts"] = cells[14].text.strip()
                temp[player_name]["2pt percentage"] = cells[15].text.strip()
                temp[player_name]["effective fg percentage"] = cells[16].text.strip()
                temp[player_name]["free throws"] = cells[17].text.strip()
                temp[player_name]["free throw attempts"] = cells[18].text.strip()
                temp[player_name]["free throw percentage"] = cells[19].text.strip()
                temp[player_name]["offensive rebounds"] = cells[20].text.strip()
                temp[player_name]["defensive rebounds"] = cells[21].text.strip()
                temp[player_name]["total rebounds"] = cells[22].text.strip()
                temp[player_name]["assists"] = cells[23].text.strip()
                temp[player_name]["steals"] = cells[24].text.strip()
                temp[player_name]["blocks"] = cells[25].text.strip()
                temp[player_name]["turnovers"] = cells[26].text.strip()
                temp[player_name]["personal fouls"] = cells[27].text.strip()
                temp[player_name]["ppg"] = cells[28].text.strip()
                usedTOT = True
                continue 
            # taking into account whether the person played for multiple teams (only keeping first)
            if player_name in player_stats:
                continue
            player_stats[player_name] = {}
            player_stats[player_name]["position"] = temp[player_name]["position"] if usedTOT else cells[1].text.strip()
            player_stats[player_name]["age"] = temp[player_name]["age"] if usedTOT else cells[2].text.strip()
            player_stats[player_name]["team"] = cells[3].text.strip()
            player_stats[player_name]["games played"] = temp[player_name]["games played"] if usedTOT else cells[4].text.strip()
            player_stats[player_name]["games started"] = temp[player_name]["games started"] if usedTOT else cells[5].text.strip()
            player_stats[player_name]["minutes played per game"] = temp[player_name]["minutes played per game"] if usedTOT else cells[6].text.strip()
            player_stats[player_name]["field goals"] = temp[player_name]["field goals"] if usedTOT else cells[7].text.strip()
            player_stats[player_name]["field goal attempts"] = temp[player_name]["field goal attempts"] if usedTOT else cells[8].text.strip()
            player_stats[player_name]["fg percentage"] = temp[player_name]["fg percentage"] if usedTOT else cells[9].text.strip()
            player_stats[player_name]["3pt per game"] = temp[player_name]["3pt per game"] if usedTOT else cells[10].text.strip()
            player_stats[player_name]["3pt attempts"] = temp[player_name]["3pt attempts"] if usedTOT else cells[11].text.strip()
            player_stats[player_name]["3pt percentage"] = temp[player_name]["3pt percentage"] if usedTOT else cells[12].text.strip()
            player_stats[player_name]["2pt per game"] = temp[player_name]["2pt per game"] if usedTOT else cells[13].text.strip()
            player_stats[player_name]["2pt attempts"] = temp[player_name]["2pt attempts"] if usedTOT else cells[14].text.strip()
            player_stats[player_name]["2pt percentage"] = temp[player_name]["2pt percentage"] if usedTOT else cells[15].text.strip()
            player_stats[player_name]["effective fg percentage"] = temp[player_name]["effective fg percentage"] if usedTOT else cells[16].text.strip()
            player_stats[player_name]["free throws"] = temp[player_name]["free throws"] if usedTOT else cells[17].text.strip()
            player_stats[player_name]["free throw attempts"] = temp[player_name]["free throw attempts"] if usedTOT else cells[18].text.strip()
            player_stats[player_name]["free throw percentage"] = temp[player_name]["free throw percentage"] if usedTOT else cells[19].text.strip()
            player_stats[player_name]["offensive rebounds"] = temp[player_name]["offensive rebounds"] if usedTOT else cells[20].text.strip()
            player_stats[player_name]["defensive rebounds"] = temp[player_name]["defensive rebounds"] if usedTOT else cells[21].text.strip()
            player_stats[player_name]["total rebounds"] = temp[player_name]["total rebounds"] if usedTOT else cells[22].text.strip()
            player_stats[player_name]["assists"] = temp[player_name]["assists"] if usedTOT else cells[23].text.strip()
            player_stats[player_name]["steals"] = temp[player_name]["steals"] if usedTOT else cells[24].text.strip()
            player_stats[player_name]["blocks"] = temp[player_name]["blocks"] if usedTOT else cells[25].text.strip()
            player_stats[player_name]["turnovers"] = temp[player_name]["turnovers"] if usedTOT else cells[26].text.strip()
            player_stats[player_name]["personal fouls"] = temp[player_name]["personal fouls"] if usedTOT else cells[27].text.strip()
            player_stats[player_name]["ppg"] = temp[player_name]["ppg"] if usedTOT else cells[28].text.strip()
            usedTOT = False
        except:
            continue

    driver.quit()
    # returning the player stats
    return player_stats

# obtaing all season data
first_season = data_scrape(URL2324)
second_season = data_scrape(URL2223)
third_season = data_scrape(URL2122)
fourth_season = data_scrape(URL2021)
fifth_season = data_scrape(URL1920)
sixth_season = data_scrape(URL1819)

Collecting MIP Rankings

The function below scrapes MIP rankings data from the given URL stored in URLMIP1920, URLMIP2021, URLMIP2122, URLMIP2223, URLMIP2324 corresponding to each year.

In [80]:
# Scrape data for all the MIP ranked tables for last 5 seasons
def mip_scrape(URL):

    driver = webdriver.Safari()

    driver.get(URL)

    # Parse the HTML content of the page
    html = BeautifulSoup(driver.page_source, 'html.parser')

    # Find the table containing the most improved players
    table = html.find('table', {'id': 'mip'})

    mipMap = {}
    body = table.find('tbody')
    
    # Extract the table rows
    rows = body.find_all('tr')
    for row in rows:
        rank = row.find('th').text.strip()
        cells = row.find_all('td')
        mipRank = cells[0].text.strip()
        mipMap[mipRank] = {}
        # Removes the T from the ranking that indicates you are tied in voting in the tables
        mipMap[mipRank]["Rank"] = re.sub(r'\D','',rank)

    driver.quit()
    return mipMap

mipTable1920 = mip_scrape(URLMIP1920)
mipTable2021 = mip_scrape(URLMIP2021)
mipTable2122 = mip_scrape(URLMIP2122)
mipTable2223 = mip_scrape(URLMIP2223)
mipTable2324 = mip_scrape(URLMIP2324)

In this section, we utilize Pandas and NumPy to manipulate and organize our dataframes, which are structured as Pandas-based objects. If you're new to these libraries, you can explore their functionalities through the following documentation:

Pandas Documentation NumPy Documentation Our goal here is to clean and organize the player statistics and MIP rankings data collected from various seasons into a format that is ready for analysis.

Creating DataFrames from Dictionaries

First, we convert the scraped dictionaries into Pandas DataFrames. Each dictionary represents the data for a specific season, and we use pd.DataFrame.from_dict to perform the conversion. The orient='index' parameter ensures that the dictionary keys become the index of the DataFrame.

Display Settings

To ensure we can view the entire contents of the DataFrames, we adjust the display settings of Pandas to show all rows and columns. This helps in verifying the completeness and correctness of our data.

Displaying DataFrames

We define functions to display the head and tail of each DataFrame. This provides a quick overview of the data and helps in verifying that the data has been loaded correctly.

In [81]:
# creating dataframe based off created dictionary
data2324 = pd.DataFrame.from_dict(first_season, orient='index')
data2223 = pd.DataFrame.from_dict(second_season, orient='index')
data2122 = pd.DataFrame.from_dict(third_season, orient='index')
data2021 = pd.DataFrame.from_dict(fourth_season, orient='index')
data1920 = pd.DataFrame.from_dict(fifth_season, orient='index')
data1819 = pd.DataFrame.from_dict(sixth_season, orient='index')

mipdata2324 = pd.DataFrame.from_dict(mipTable2324, orient='index')
mipdata2223 = pd.DataFrame.from_dict(mipTable2223, orient='index')
mipdata2122 = pd.DataFrame.from_dict(mipTable2122, orient='index')
mipdata2021 = pd.DataFrame.from_dict(mipTable2021, orient='index')
mipdata1920 = pd.DataFrame.from_dict(mipTable1920, orient='index')

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

count = 24
mipcount = 24
# now printing dataframe data
def data_display(dataframe):
    global count
    print("\n" + "20" + str(count - 1) + "-" + str(count) + " SEASON")
    print(dataframe.head())
    print(dataframe.tail())
    count -= 1

def mip_display(dataframe):
    global mipcount
    print("\n" + "20" + str(mipcount - 1) + "-" + str(mipcount) + " SEASON")
    print(dataframe.head())
    print(dataframe.tail())
    mipcount -= 1

data_display(data2324)
data_display(data2223)
data_display(data2122)
data_display(data2021)
data_display(data1920)
data_display(data1819)

mip_display(mipdata2324)
mip_display(mipdata2223)
mip_display(mipdata2122)
mip_display(mipdata2021)
mip_display(mipdata1920)
2023-24 SEASON
                         position age team games played games started  \
Precious Achiuwa             PF-C  24  TOR           74            18   
Bam Adebayo                     C  26  MIA           71            71   
Ochai Agbaji                   SG  23  UTA           78            28   
Santi Aldama                   PF  23  MEM           61            35   
Nickeil Alexander-Walker       SG  25  MIN           82            20   

                         minutes played per game field goals  \
Precious Achiuwa                            21.9         3.2   
Bam Adebayo                                 34.0         7.5   
Ochai Agbaji                                21.0         2.3   
Santi Aldama                                26.5         4.0   
Nickeil Alexander-Walker                    23.4         2.9   

                         field goal attempts fg percentage 3pt per game  \
Precious Achiuwa                         6.3          .501          0.4   
Bam Adebayo                             14.3          .521          0.2   
Ochai Agbaji                             5.6          .411          0.8   
Santi Aldama                             9.3          .435          1.7   
Nickeil Alexander-Walker                 6.6          .439          1.6   

                         3pt attempts 3pt percentage 2pt per game  \
Precious Achiuwa                  1.3           .268          2.8   
Bam Adebayo                       0.6           .357          7.3   
Ochai Agbaji                      2.7           .294          1.5   
Santi Aldama                      5.0           .349          2.3   
Nickeil Alexander-Walker          4.1           .391          1.3   

                         2pt attempts 2pt percentage effective fg percentage  \
Precious Achiuwa                  5.0           .562                    .529   
Bam Adebayo                      13.7           .528                    .529   
Ochai Agbaji                      2.8           .523                    .483   
Santi Aldama                      4.3           .534                    .528   
Nickeil Alexander-Walker          2.5           .517                    .560   

                         free throws free throw attempts  \
Precious Achiuwa                 0.9                 1.5   
Bam Adebayo                      4.1                 5.5   
Ochai Agbaji                     0.5                 0.7   
Santi Aldama                     0.9                 1.4   
Nickeil Alexander-Walker         0.6                 0.8   

                         free throw percentage offensive rebounds  \
Precious Achiuwa                          .616                2.6   
Bam Adebayo                               .755                2.2   
Ochai Agbaji                              .661                0.9   
Santi Aldama                              .621                1.2   
Nickeil Alexander-Walker                  .800                0.4   

                         defensive rebounds total rebounds assists steals  \
Precious Achiuwa                        4.0            6.6     1.3    0.6   
Bam Adebayo                             8.1           10.4     3.9    1.1   
Ochai Agbaji                            1.8            2.8     1.1    0.6   
Santi Aldama                            4.6            5.8     2.3    0.7   
Nickeil Alexander-Walker                1.6            2.0     2.5    0.8   

                         blocks turnovers personal fouls   ppg  
Precious Achiuwa            0.9       1.1            1.9   7.6  
Bam Adebayo                 0.9       2.3            2.2  19.3  
Ochai Agbaji                0.6       0.8            1.5   5.8  
Santi Aldama                0.9       1.1            1.5  10.7  
Nickeil Alexander-Walker    0.5       0.9            1.7   8.0  
               position age team games played games started  \
Thaddeus Young       PF  35  TOR           33             6   
Trae Young           PG  25  ATL           54            54   
Omer Yurtseven        C  25  UTA           48            12   
Cody Zeller           C  31  NOP           43             0   
Ivica Zubac           C  26  LAC           68            68   

               minutes played per game field goals field goal attempts  \
Thaddeus Young                    13.3         2.0                 3.3   
Trae Young                        36.0         8.0                18.7   
Omer Yurtseven                    11.4         2.1                 3.8   
Cody Zeller                        7.4         0.6                 1.4   
Ivica Zubac                       26.4         5.0                 7.6   

               fg percentage 3pt per game 3pt attempts 3pt percentage  \
Thaddeus Young          .602          0.0          0.2           .143   
Trae Young              .430          3.2          8.7           .373   
Omer Yurtseven          .538          0.1          0.5           .208   
Cody Zeller             .419          0.0          0.1           .333   
Ivica Zubac             .649          0.0          0.0                  

               2pt per game 2pt attempts 2pt percentage  \
Thaddeus Young          1.9          3.1           .634   
Trae Young              4.8         10.0           .479   
Omer Yurtseven          2.0          3.3           .588   
Cody Zeller             0.6          1.4           .424   
Ivica Zubac             5.0          7.6           .649   

               effective fg percentage free throws free throw attempts  \
Thaddeus Young                    .606         0.2                 0.5   
Trae Young                        .516         6.4                 7.5   
Omer Yurtseven                    .552         0.4                 0.6   
Cody Zeller                       .427         0.5                 0.9   
Ivica Zubac                       .649         1.8                 2.4   

               free throw percentage offensive rebounds defensive rebounds  \
Thaddeus Young                  .400                1.4                1.7   
Trae Young                      .855                0.4                2.3   
Omer Yurtseven                  .679                1.5                2.8   
Cody Zeller                     .605                1.1                1.5   
Ivica Zubac                     .723                2.9                6.3   

               total rebounds assists steals blocks turnovers personal fouls  \
Thaddeus Young            3.1     1.7    0.7    0.2       0.5            1.5   
Trae Young                2.8    10.8    1.3    0.2       4.4            2.0   
Omer Yurtseven            4.3     0.6    0.2    0.4       0.8            1.1   
Cody Zeller               2.6     0.9    0.2    0.1       0.4            1.0   
Ivica Zubac               9.2     1.4    0.3    1.2       1.2            2.6   

                 ppg  
Thaddeus Young   4.2  
Trae Young      25.7  
Omer Yurtseven   4.6  
Cody Zeller      1.8  
Ivica Zubac     11.7  

2022-23 SEASON
                 position age team games played games started  \
Precious Achiuwa        C  23  TOR           55            12   
Steven Adams            C  29  MEM           42            42   
Bam Adebayo             C  25  MIA           75            75   
Ochai Agbaji           SG  22  UTA           59            22   
Santi Aldama           PF  22  MEM           77            20   

                 minutes played per game field goals field goal attempts  \
Precious Achiuwa                    20.7         3.6                 7.3   
Steven Adams                        27.0         3.7                 6.3   
Bam Adebayo                         34.6         8.0                14.9   
Ochai Agbaji                        20.5         2.8                 6.5   
Santi Aldama                        21.8         3.2                 6.8   

                 fg percentage 3pt per game 3pt attempts 3pt percentage  \
Precious Achiuwa          .485          0.5          2.0           .269   
Steven Adams              .597          0.0          0.0           .000   
Bam Adebayo               .540          0.0          0.2           .083   
Ochai Agbaji              .427          1.4          3.9           .355   
Santi Aldama              .470          1.2          3.5           .353   

                 2pt per game 2pt attempts 2pt percentage  \
Precious Achiuwa          3.0          5.4           .564   
Steven Adams              3.7          6.2           .599   
Bam Adebayo               8.0         14.7           .545   
Ochai Agbaji              1.4          2.7           .532   
Santi Aldama              2.0          3.4           .591   

                 effective fg percentage free throws free throw attempts  \
Precious Achiuwa                    .521         1.6                 2.3   
Steven Adams                        .597         1.1                 3.1   
Bam Adebayo                         .541         4.3                 5.4   
Ochai Agbaji                        .532         0.9                 1.2   
Santi Aldama                        .560         1.4                 1.9   

                 free throw percentage offensive rebounds defensive rebounds  \
Precious Achiuwa                  .702                1.8                4.1   
Steven Adams                      .364                5.1                6.5   
Bam Adebayo                       .806                2.5                6.7   
Ochai Agbaji                      .812                0.7                1.3   
Santi Aldama                      .750                1.1                3.7   

                 total rebounds assists steals blocks turnovers  \
Precious Achiuwa            6.0     0.9    0.6    0.5       1.1   
Steven Adams               11.5     2.3    0.9    1.1       1.9   
Bam Adebayo                 9.2     3.2    1.2    0.8       2.5   
Ochai Agbaji                2.1     1.1    0.3    0.3       0.7   
Santi Aldama                4.8     1.3    0.6    0.6       0.8   

                 personal fouls   ppg  
Precious Achiuwa            1.9   9.2  
Steven Adams                2.3   8.6  
Bam Adebayo                 2.8  20.4  
Ochai Agbaji                1.7   7.9  
Santi Aldama                1.9   9.0  
               position age team games played games started  \
Thaddeus Young       PF  34  TOR           54             9   
Trae Young           PG  24  ATL           73            73   
Omer Yurtseven        C  24  MIA            9             0   
Cody Zeller           C  30  MIA           15             2   
Ivica Zubac           C  25  LAC           76            76   

               minutes played per game field goals field goal attempts  \
Thaddeus Young                    14.7         2.0                 3.7   
Trae Young                        34.8         8.2                19.0   
Omer Yurtseven                     9.2         1.8                 3.0   
Cody Zeller                       14.5         2.5                 3.9   
Ivica Zubac                       28.6         4.3                 6.8   

               fg percentage 3pt per game 3pt attempts 3pt percentage  \
Thaddeus Young          .545          0.1          0.6           .176   
Trae Young              .429          2.1          6.3           .335   
Omer Yurtseven          .593          0.3          0.8           .429   
Cody Zeller             .627          0.0          0.1           .000   
Ivica Zubac             .634          0.0          0.0           .000   

               2pt per game 2pt attempts 2pt percentage  \
Thaddeus Young          1.9          3.0           .622   
Trae Young              6.1         12.7           .476   
Omer Yurtseven          1.4          2.2           .650   
Cody Zeller             2.5          3.8           .649   
Ivica Zubac             4.3          6.7           .637   

               effective fg percentage free throws free throw attempts  \
Thaddeus Young                    .561         0.3                 0.5   
Trae Young                        .485         7.8                 8.8   
Omer Yurtseven                    .648         0.6                 0.7   
Cody Zeller                       .627         1.6                 2.3   
Ivica Zubac                       .634         2.2                 3.1   

               free throw percentage offensive rebounds defensive rebounds  \
Thaddeus Young                  .692                1.3                1.8   
Trae Young                      .886                0.8                2.2   
Omer Yurtseven                  .833                0.9                1.7   
Cody Zeller                     .686                1.7                2.6   
Ivica Zubac                     .697                3.1                6.8   

               total rebounds assists steals blocks turnovers personal fouls  \
Thaddeus Young            3.1     1.4    1.0    0.1       0.8            1.6   
Trae Young                3.0    10.2    1.1    0.1       4.1            1.4   
Omer Yurtseven            2.6     0.2    0.2    0.2       0.4            1.8   
Cody Zeller               4.3     0.7    0.2    0.3       0.9            2.2   
Ivica Zubac               9.9     1.0    0.4    1.3       1.5            2.9   

                 ppg  
Thaddeus Young   4.4  
Trae Young      26.2  
Omer Yurtseven   4.4  
Cody Zeller      6.5  
Ivica Zubac     10.8  

2021-22 SEASON
                  position age team games played games started  \
Precious Achiuwa         C  22  TOR           73            28   
Steven Adams             C  28  MEM           76            75   
Bam Adebayo              C  24  MIA           56            56   
Santi Aldama            PF  21  MEM           32             0   
LaMarcus Aldridge        C  36  BRK           47            12   

                  minutes played per game field goals field goal attempts  \
Precious Achiuwa                     23.6         3.6                 8.3   
Steven Adams                         26.3         2.8                 5.1   
Bam Adebayo                          32.6         7.3                13.0   
Santi Aldama                         11.3         1.7                 4.1   
LaMarcus Aldridge                    22.3         5.4                 9.7   

                  fg percentage 3pt per game 3pt attempts 3pt percentage  \
Precious Achiuwa           .439          0.8          2.1           .359   
Steven Adams               .547          0.0          0.0           .000   
Bam Adebayo                .557          0.0          0.1           .000   
Santi Aldama               .402          0.2          1.5           .125   
LaMarcus Aldridge          .550          0.3          1.0           .304   

                  2pt per game 2pt attempts 2pt percentage  \
Precious Achiuwa           2.9          6.1           .468   
Steven Adams               2.8          5.0           .548   
Bam Adebayo                7.3         12.9           .562   
Santi Aldama               1.5          2.6           .560   
LaMarcus Aldridge          5.1          8.8           .578   

                  effective fg percentage free throws free throw attempts  \
Precious Achiuwa                     .486         1.1                 1.8   
Steven Adams                         .547         1.4                 2.6   
Bam Adebayo                          .557         4.6                 6.1   
Santi Aldama                         .424         0.6                 1.0   
LaMarcus Aldridge                    .566         1.9                 2.2   

                  free throw percentage offensive rebounds defensive rebounds  \
Precious Achiuwa                   .595                2.0                4.5   
Steven Adams                       .543                4.6                5.4   
Bam Adebayo                        .753                2.4                7.6   
Santi Aldama                       .625                1.0                1.7   
LaMarcus Aldridge                  .873                1.6                3.9   

                  total rebounds assists steals blocks turnovers  \
Precious Achiuwa             6.5     1.1    0.5    0.6       1.2   
Steven Adams                10.0     3.4    0.9    0.8       1.5   
Bam Adebayo                 10.1     3.4    1.4    0.8       2.6   
Santi Aldama                 2.7     0.7    0.2    0.3       0.5   
LaMarcus Aldridge            5.5     0.9    0.3    1.0       0.9   

                  personal fouls   ppg  
Precious Achiuwa             2.1   9.1  
Steven Adams                 2.0   6.9  
Bam Adebayo                  3.1  19.1  
Santi Aldama                 1.1   4.1  
LaMarcus Aldridge            1.7  12.9  
               position age team games played games started  \
Thaddeus Young       PF  33  SAS           52             1   
Trae Young           PG  23  ATL           76            76   
Omer Yurtseven        C  23  MIA           56            12   
Cody Zeller           C  29  POR           27             0   
Ivica Zubac           C  24  LAC           76            76   

               minutes played per game field goals field goal attempts  \
Thaddeus Young                    16.3         2.7                 5.2   
Trae Young                        34.9         9.4                20.3   
Omer Yurtseven                    12.6         2.3                 4.4   
Cody Zeller                       13.1         1.9                 3.3   
Ivica Zubac                       24.4         4.1                 6.5   

               fg percentage 3pt per game 3pt attempts 3pt percentage  \
Thaddeus Young          .518          0.3          0.9           .354   
Trae Young              .460          3.1          8.0           .382   
Omer Yurtseven          .526          0.0          0.2           .091   
Cody Zeller             .567          0.0          0.1           .000   
Ivica Zubac             .626          0.0          0.0                  

               2pt per game 2pt attempts 2pt percentage  \
Thaddeus Young          2.4          4.3           .554   
Trae Young              6.3         12.3           .512   
Omer Yurtseven          2.3          4.2           .547   
Cody Zeller             1.9          3.2           .593   
Ivica Zubac             4.1          6.5           .626   

               effective fg percentage free throws free throw attempts  \
Thaddeus Young                    .550         0.4                 0.9   
Trae Young                        .536         6.6                 7.3   
Omer Yurtseven                    .528         0.7                 1.1   
Cody Zeller                       .567         1.4                 1.8   
Ivica Zubac                       .626         2.2                 3.0   

               free throw percentage offensive rebounds defensive rebounds  \
Thaddeus Young                  .469                1.5                2.5   
Trae Young                      .904                0.7                3.1   
Omer Yurtseven                  .623                1.5                3.7   
Cody Zeller                     .776                1.9                2.8   
Ivica Zubac                     .727                2.9                5.6   

               total rebounds assists steals blocks turnovers personal fouls  \
Thaddeus Young            4.0     2.0    1.0    0.3       1.0            1.6   
Trae Young                3.7     9.7    0.9    0.1       4.0            1.7   
Omer Yurtseven            5.3     0.9    0.3    0.4       0.7            1.5   
Cody Zeller               4.6     0.8    0.3    0.2       0.7            2.1   
Ivica Zubac               8.5     1.6    0.5    1.0       1.5            2.7   

                 ppg  
Thaddeus Young   6.2  
Trae Young      28.4  
Omer Yurtseven   5.3  
Cody Zeller      5.2  
Ivica Zubac     10.3  

2020-21 SEASON
                  position age team games played games started  \
Precious Achiuwa        PF  21  MIA           61             4   
Jaylen Adams            PG  24  MIL            7             0   
Steven Adams             C  27  NOP           58            58   
Bam Adebayo              C  23  MIA           64            64   
LaMarcus Aldridge        C  35  SAS           26            23   

                  minutes played per game field goals field goal attempts  \
Precious Achiuwa                     12.1         2.0                 3.7   
Jaylen Adams                          2.6         0.1                 1.1   
Steven Adams                         27.7         3.3                 5.3   
Bam Adebayo                          33.5         7.1                12.5   
LaMarcus Aldridge                    25.9         5.4                11.4   

                  fg percentage 3pt per game 3pt attempts 3pt percentage  \
Precious Achiuwa           .544          0.0          0.0           .000   
Jaylen Adams               .125          0.0          0.3           .000   
Steven Adams               .614          0.0          0.1           .000   
Bam Adebayo                .570          0.0          0.1           .250   
LaMarcus Aldridge          .473          1.2          3.1           .388   

                  2pt per game 2pt attempts 2pt percentage  \
Precious Achiuwa           2.0          3.7           .546   
Jaylen Adams               0.1          0.9           .167   
Steven Adams               3.3          5.3           .620   
Bam Adebayo                7.1         12.4           .573   
LaMarcus Aldridge          4.2          8.3           .505   

                  effective fg percentage free throws free throw attempts  \
Precious Achiuwa                     .544         0.9                 1.8   
Jaylen Adams                         .125         0.0                 0.0   
Steven Adams                         .614         1.0                 2.3   
Bam Adebayo                          .571         4.4                 5.5   
LaMarcus Aldridge                    .525         1.6                 1.8   

                  free throw percentage offensive rebounds defensive rebounds  \
Precious Achiuwa                   .509                1.2                2.2   
Jaylen Adams                                           0.0                0.4   
Steven Adams                       .444                3.7                5.2   
Bam Adebayo                        .799                2.2                6.7   
LaMarcus Aldridge                  .872                0.7                3.8   

                  total rebounds assists steals blocks turnovers  \
Precious Achiuwa             3.4     0.5    0.3    0.5       0.7   
Jaylen Adams                 0.4     0.3    0.0    0.0       0.0   
Steven Adams                 8.9     1.9    0.9    0.7       1.3   
Bam Adebayo                  9.0     5.4    1.2    1.0       2.6   
LaMarcus Aldridge            4.5     1.9    0.4    1.1       1.0   

                  personal fouls   ppg  
Precious Achiuwa             1.5   5.0  
Jaylen Adams                 0.1   0.3  
Steven Adams                 1.9   7.6  
Bam Adebayo                  2.3  18.7  
LaMarcus Aldridge            1.8  13.5  
               position age team games played games started  \
Delon Wright         PG  28  DET           63            39   
Thaddeus Young       PF  32  CHI           68            23   
Trae Young           PG  22  ATL           63            63   
Cody Zeller           C  28  CHO           48            21   
Ivica Zubac           C  23  LAC           72            33   

               minutes played per game field goals field goal attempts  \
Delon Wright                      27.7         3.8                 8.2   
Thaddeus Young                    24.3         5.4                 9.7   
Trae Young                        33.7         7.7                17.7   
Cody Zeller                       20.9         3.8                 6.8   
Ivica Zubac                       22.3         3.6                 5.5   

               fg percentage 3pt per game 3pt attempts 3pt percentage  \
Delon Wright            .463          1.0          2.7           .372   
Thaddeus Young          .559          0.2          0.7           .267   
Trae Young              .438          2.2          6.3           .343   
Cody Zeller             .559          0.1          0.6           .143   
Ivica Zubac             .652          0.0          0.1           .250   

               2pt per game 2pt attempts 2pt percentage  \
Delon Wright            2.8          5.5           .509   
Thaddeus Young          5.3          9.1           .580   
Trae Young              5.6         11.3           .491   
Cody Zeller             3.7          6.2           .598   
Ivica Zubac             3.6          5.4           .656   

               effective fg percentage free throws free throw attempts  \
Delon Wright                      .525         1.6                 2.0   
Thaddeus Young                    .568         1.0                 1.7   
Trae Young                        .499         7.7                 8.7   
Cody Zeller                       .565         1.8                 2.5   
Ivica Zubac                       .654         1.9                 2.4   

               free throw percentage offensive rebounds defensive rebounds  \
Delon Wright                    .802                1.0                3.2   
Thaddeus Young                  .628                2.5                3.8   
Trae Young                      .886                0.6                3.3   
Cody Zeller                     .714                2.5                4.4   
Ivica Zubac                     .789                2.6                4.6   

               total rebounds assists steals blocks turnovers personal fouls  \
Delon Wright              4.3     4.4    1.6    0.5       1.3            1.2   
Thaddeus Young            6.2     4.3    1.1    0.6       2.0            2.2   
Trae Young                3.9     9.4    0.8    0.2       4.1            1.8   
Cody Zeller               6.8     1.8    0.6    0.4       1.1            2.5   
Ivica Zubac               7.2     1.3    0.3    0.9       1.1            2.6   

                 ppg  
Delon Wright    10.2  
Thaddeus Young  12.1  
Trae Young      25.3  
Cody Zeller      9.4  
Ivica Zubac      9.0  

2019-20 SEASON
                         position age team games played games started  \
Steven Adams                    C  26  OKC           63            63   
Bam Adebayo                    PF  22  MIA           72            72   
LaMarcus Aldridge               C  34  SAS           53            53   
Kyle Alexander                  C  23  MIA            2             0   
Nickeil Alexander-Walker       SG  21  NOP           47             1   

                         minutes played per game field goals  \
Steven Adams                                26.7         4.5   
Bam Adebayo                                 33.6         6.1   
LaMarcus Aldridge                           33.1         7.4   
Kyle Alexander                               6.5         0.5   
Nickeil Alexander-Walker                    12.6         2.1   

                         field goal attempts fg percentage 3pt per game  \
Steven Adams                             7.6          .592          0.0   
Bam Adebayo                             11.0          .557          0.0   
LaMarcus Aldridge                       15.0          .493          1.2   
Kyle Alexander                           1.0          .500          0.0   
Nickeil Alexander-Walker                 5.7          .368          1.0   

                         3pt attempts 3pt percentage 2pt per game  \
Steven Adams                      0.0           .333          4.5   
Bam Adebayo                       0.2           .143          6.1   
LaMarcus Aldridge                 3.0           .389          6.2   
Kyle Alexander                    0.0                         0.5   
Nickeil Alexander-Walker          2.8           .346          1.1   

                         2pt attempts 2pt percentage effective fg percentage  \
Steven Adams                      7.5           .594                    .593   
Bam Adebayo                      10.8           .564                    .558   
LaMarcus Aldridge                12.0           .519                    .532   
Kyle Alexander                    1.0           .500                    .500   
Nickeil Alexander-Walker          2.8           .391                    .455   

                         free throws free throw attempts  \
Steven Adams                     1.9                 3.2   
Bam Adebayo                      3.7                 5.3   
LaMarcus Aldridge                3.0                 3.6   
Kyle Alexander                   0.0                 0.0   
Nickeil Alexander-Walker         0.5                 0.8   

                         free throw percentage offensive rebounds  \
Steven Adams                              .582                3.3   
Bam Adebayo                               .691                2.4   
LaMarcus Aldridge                         .827                1.9   
Kyle Alexander                                                1.0   
Nickeil Alexander-Walker                  .676                0.2   

                         defensive rebounds total rebounds assists steals  \
Steven Adams                            6.0            9.3     2.3    0.8   
Bam Adebayo                             7.8           10.2     5.1    1.1   
LaMarcus Aldridge                       5.5            7.4     2.4    0.7   
Kyle Alexander                          0.5            1.5     0.0    0.0   
Nickeil Alexander-Walker                1.6            1.8     1.9    0.4   

                         blocks turnovers personal fouls   ppg  
Steven Adams                1.1       1.5            1.9  10.9  
Bam Adebayo                 1.3       2.8            2.5  15.9  
LaMarcus Aldridge           1.6       1.4            2.4  18.9  
Kyle Alexander              0.0       0.5            0.5   1.0  
Nickeil Alexander-Walker    0.2       1.1            1.2   5.7  
             position age team games played games started  \
Trae Young         PG  21  ATL           60            60   
Cody Zeller         C  27  CHO           58            39   
Tyler Zeller        C  30  SAS            2             0   
Ante Žižić          C  23  CLE           22             0   
Ivica Zubac         C  22  LAC           72            70   

             minutes played per game field goals field goal attempts  \
Trae Young                      35.3         9.1                20.8   
Cody Zeller                     23.1         4.3                 8.3   
Tyler Zeller                     2.0         0.5                 2.0   
Ante Žižić                      10.0         1.9                 3.3   
Ivica Zubac                     18.4         3.3                 5.3   

             fg percentage 3pt per game 3pt attempts 3pt percentage  \
Trae Young            .437          3.4          9.5           .361   
Cody Zeller           .524          0.3          1.3           .240   
Tyler Zeller          .250          0.0          0.0                  
Ante Žižić            .569          0.0          0.0                  
Ivica Zubac           .613          0.0          0.0           .000   

             2pt per game 2pt attempts 2pt percentage effective fg percentage  \
Trae Young            5.7         11.4           .501                    .519   
Cody Zeller           4.0          7.0           .577                    .543   
Tyler Zeller          0.5          2.0           .250                    .250   
Ante Žižić            1.9          3.3           .569                    .569   
Ivica Zubac           3.3          5.3           .616                    .613   

             free throws free throw attempts free throw percentage  \
Trae Young           8.0                 9.3                  .860   
Cody Zeller          2.1                 3.1                  .682   
Tyler Zeller         0.0                 0.0                         
Ante Žižić           0.6                 0.9                  .737   
Ivica Zubac          1.7                 2.3                  .747   

             offensive rebounds defensive rebounds total rebounds assists  \
Trae Young                  0.5                3.7            4.3     9.3   
Cody Zeller                 2.8                4.3            7.1     1.5   
Tyler Zeller                1.5                0.5            2.0     0.0   
Ante Žižić                  0.8                2.2            3.0     0.3   
Ivica Zubac                 2.7                4.8            7.5     1.1   

             steals blocks turnovers personal fouls   ppg  
Trae Young      1.1    0.1       4.8            1.7  29.6  
Cody Zeller     0.7    0.4       1.3            2.4  11.1  
Tyler Zeller    0.0    0.0       0.0            0.0   1.0  
Ante Žižić      0.3    0.2       0.5            1.2   4.4  
Ivica Zubac     0.2    0.9       0.8            2.3   8.3  

2018-19 SEASON
             position age team games played games started  \
Álex Abrines       SG  25  OKC           31             2   
Quincy Acy         PF  28  PHO           10             0   
Jaylen Adams       PG  22  ATL           34             1   
Steven Adams        C  25  OKC           80            80   
Bam Adebayo         C  21  MIA           82            28   

             minutes played per game field goals field goal attempts  \
Álex Abrines                    19.0         1.8                 5.1   
Quincy Acy                      12.3         0.4                 1.8   
Jaylen Adams                    12.6         1.1                 3.2   
Steven Adams                    33.4         6.0                10.1   
Bam Adebayo                     23.3         3.4                 5.9   

             fg percentage 3pt per game 3pt attempts 3pt percentage  \
Álex Abrines          .357          1.3          4.1           .323   
Quincy Acy            .222          0.2          1.5           .133   
Jaylen Adams          .345          0.7          2.2           .338   
Steven Adams          .595          0.0          0.0           .000   
Bam Adebayo           .576          0.0          0.2           .200   

             2pt per game 2pt attempts 2pt percentage effective fg percentage  \
Álex Abrines          0.5          1.0           .500                    .487   
Quincy Acy            0.2          0.3           .667                    .278   
Jaylen Adams          0.4          1.1           .361                    .459   
Steven Adams          6.0         10.1           .596                    .595   
Bam Adebayo           3.4          5.7           .588                    .579   

             free throws free throw attempts free throw percentage  \
Álex Abrines         0.4                 0.4                  .923   
Quincy Acy           0.7                 1.0                  .700   
Jaylen Adams         0.2                 0.3                  .778   
Steven Adams         1.8                 3.7                  .500   
Bam Adebayo          2.0                 2.8                  .735   

             offensive rebounds defensive rebounds total rebounds assists  \
Álex Abrines                0.2                1.4            1.5     0.6   
Quincy Acy                  0.3                2.2            2.5     0.8   
Jaylen Adams                0.3                1.4            1.8     1.9   
Steven Adams                4.9                4.6            9.5     1.6   
Bam Adebayo                 2.0                5.3            7.3     2.2   

             steals blocks turnovers personal fouls   ppg  
Álex Abrines    0.5    0.2       0.5            1.7   5.3  
Quincy Acy      0.1    0.4       0.4            2.4   1.7  
Jaylen Adams    0.4    0.1       0.8            1.3   3.2  
Steven Adams    1.5    1.0       1.7            2.6  13.9  
Bam Adebayo     0.9    0.8       1.5            2.5   8.9  
             position age team games played games started  \
Trae Young         PG  20  ATL           81            81   
Cody Zeller         C  26  CHO           49            47   
Tyler Zeller        C  29  ATL            6             1   
Ante Žižić          C  22  CLE           59            25   
Ivica Zubac         C  21  LAL           59            37   

             minutes played per game field goals field goal attempts  \
Trae Young                      30.9         6.5                15.5   
Cody Zeller                     25.4         3.9                 7.0   
Tyler Zeller                    15.5         2.7                 5.0   
Ante Žižić                      18.3         3.1                 5.6   
Ivica Zubac                     17.6         3.6                 6.4   

             fg percentage 3pt per game 3pt attempts 3pt percentage  \
Trae Young            .418          1.9          6.0           .324   
Cody Zeller           .551          0.1          0.4           .273   
Tyler Zeller          .533          0.0          0.2           .000   
Ante Žižić            .553          0.0          0.0                  
Ivica Zubac           .559          0.0          0.0                  

             2pt per game 2pt attempts 2pt percentage effective fg percentage  \
Trae Young            4.6          9.6           .477                    .480   
Cody Zeller           3.8          6.6           .570                    .559   
Tyler Zeller          2.7          4.8           .552                    .533   
Ante Žižić            3.1          5.6           .553                    .553   
Ivica Zubac           3.6          6.4           .559                    .559   

             free throws free throw attempts free throw percentage  \
Trae Young           4.2                 5.1                  .829   
Cody Zeller          2.3                 2.9                  .787   
Tyler Zeller         2.3                 3.0                  .778   
Ante Žižić           1.6                 2.2                  .705   
Ivica Zubac          1.7                 2.1                  .802   

             offensive rebounds defensive rebounds total rebounds assists  \
Trae Young                  0.8                2.9            3.7     8.1   
Cody Zeller                 2.2                4.6            6.8     2.1   
Tyler Zeller                1.8                2.2            4.0     0.7   
Ante Žižić                  1.8                3.6            5.4     0.9   
Ivica Zubac                 1.9                4.2            6.1     1.1   

             steals blocks turnovers personal fouls   ppg  
Trae Young      0.9    0.2       3.8            1.7  19.1  
Cody Zeller     0.8    0.8       1.3            3.3  10.1  
Tyler Zeller    0.2    0.5       0.7            3.3   7.7  
Ante Žižić      0.2    0.4       1.0            1.9   7.8  
Ivica Zubac     0.2    0.9       1.2            2.3   8.9  

2023-24 SEASON
               Rank
Tyrese Maxey      1
Coby White        2
Alperen Sengun    3
Jalen Williams    4
Jalen Brunson     5
                        Rank
Grayson Allen             10
Duncan Robinson           10
Shai Gilgeous-Alexander   12
Devin Vassell             12
Aaron Nesmith             14

2022-23 SEASON
                        Rank
Lauri Markkanen            1
Shai Gilgeous-Alexander    2
Jalen Brunson              3
Mikal Bridges              4
Nic Claxton                5
                  Rank
Kevon Looney         8
Austin Reaves       10
Aaron Gordon        11
Jaren Jackson Jr.   11
Malik Monk          11

2021-22 SEASON
                Rank
Ja Morant          1
Dejounte Murray    2
Darius Garland     3
Jordan Poole       4
Desmond Bane       5
                  Rank
Anfernee Simons      8
Robert Williams      9
Jaren Jackson Jr.   10
Jalen Brunson       11
Max Strus           12

2020-21 SEASON
                   Rank
Julius Randle         1
Jerami Grant          2
Michael Porter Jr.    3
Christian Wood        4
Zach LaVine           5
                        Rank
Shai Gilgeous-Alexander   19
Richaun Holmes            19
T.J. McConnell            19
Terry Rozier              19
Andrew Wiggins            19

2019-20 SEASON
                Rank
Brandon Ingram     1
Bam Adebayo        2
Luka Dončić        3
Jayson Tatum       4
Devonte' Graham    5
                  Rank
Dāvis Bertāns       11
Jaylen Brown        11
Markelle Fultz      13
Spencer Dinwiddie   14
Duncan Robinson     14

Data Processing Second Part In this part of the data processing section, we focus on extracting the Most Improved Player (MIP) winners from each season's MIP rankings and storing them in a list. This helps us keep track of the players who have won the MIP award over the past five seasons. This step is crucial for our analysis, as it allows us to identify and compare these players' performance metrics against other players. This structured approach allows us to systematically verify and prepare the data for deeper exploration and modeling, setting the foundation for meaningful insights into player performance and the identification of future Most Improved Player candidates.

In [82]:
pastMipWinners = []

def getMipWinners(mipYear, mipArray):
    mipArray.append(mipYear.index[0])
    return mipArray

pastMipWinners = getMipWinners(mipdata1920, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2021, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2122, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2223, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2324, pastMipWinners)
print(pastMipWinners)
['Brandon Ingram', 'Julius Randle', 'Ja Morant', 'Lauri Markkanen', 'Tyrese Maxey']

Create correlation graphs that would be useful to determining most improved player

  1. Pure Offensive stats to MIP winners (PPG, assists, O-rebounds, FG%)
  2. Pure Defensive stats to MIP winners (Steals, Blocks, D-Rebounds)
  3. Impact on game(Games played and minutes increase over seasons, FGA vs FGA)
  1. Prediction of the Most improved player for 2024-2025 season based on coefficients

past 5 Winners

  1. 2024 Tyrese Maxey PG Philadelphia 76ers .450 25.9 3.7 6.2 0.5
  2. 2023 Lauri Markkanen PF Utah Jazz .499 25.6 8.6 1.9 0.6
  3. 2022 Ja Morant PG Memphis Grizzlies .493 27.4 5.7 6.7 0.4
  4. 2021 Julius Randle PF New York Knicks .456 24.1 10.2 6.0 0.3
  5. 2020 Brandon Ingram F New Orleans Pelicans.463 23.8 6.1 4.2 0.6

Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of Points Per Game (PPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the point difference, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in Points per game is highlighted in green and the most improved player is highlighted in red.

In [83]:
# Any player that cannot stay in the league will have NaN values and thus will be dropped from the table
# Store the points assists and rebounds in a data frame and remove null string rows from data frame to get difference in all active players
pureOffensive1819 = data1819[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan) 
pureOffensive1920 = data1920[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
pureOffensive2021 = data2021[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan) 
pureOffensive2122 = data2122[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
pureOffensive2223 = data2223[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan) 
pureOffensive2324 = data2324[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)

# # Convert each of the stats to floats for arithmetics subtraction over seasons
pureOffensive1819 = pureOffensive1819.astype(float) # 1920 season
pureOffensive1920 = pureOffensive1920.astype(float)
pureOffensive2021 = pureOffensive2021.astype(float)
pureOffensive2122 = pureOffensive2122.astype(float)
pureOffensive2223 = pureOffensive2223.astype(float)
pureOffensive2324 = pureOffensive2324.astype(float)
# Subtracts from the 19/20 season data so growth and decline is noted in the data frame
pureOffensive1820 = pureOffensive1920.sub(pureOffensive1819)
pureOffensive1921 = pureOffensive2021.sub(pureOffensive1920)
pureOffensive2022 = pureOffensive2122.sub(pureOffensive2021)
pureOffensive2123 = pureOffensive2223.sub(pureOffensive2122)
pureOffensive2224 = pureOffensive2324.sub(pureOffensive2223)

# print(pureOffensive1820)
allOffense = [pureOffensive1820, pureOffensive1921, pureOffensive2022, pureOffensive2123, pureOffensive2224]
yc = 0
years = ["19-20", "20-21", "21-22", "22-23", "23-24"]

def get_ppg_diff(offense, mip, year):

    
    offense.dropna(inplace=True)

    # Sort the DataFrame by the difference in PPG
    sorted_growth_df = offense.sort_values(by='ppg', ascending=False)

    # Plotting the difference in PPG for all players
    plt.figure(figsize=(10, 6))
    plt.barh(sorted_growth_df.index, sorted_growth_df['ppg'], color='skyblue')
    plt.title(f'Difference in Points Per Game (PPG) in {year} from last season')
    plt.xlabel('Difference in PPG')
    plt.ylabel('Players')

    # Highlighting the player with the highest difference
    highest_difference_player = sorted_growth_df.index[0]
    highest_difference = sorted_growth_df.loc[highest_difference_player, 'ppg']
    plt.barh(highest_difference_player, highest_difference, color='green', label='Highest PPG growth Player')

    # Annotating the highest difference
    plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')

    # Highlighting the award player
    if mip in offense.index:
        award_player_difference = offense.loc[mip, 'ppg']
        plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
        plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')

    plt.legend(loc= 'lower left')
    plt.tight_layout()
    plt.show()

for offense in allOffense:
    mip = pastMipWinners[yc]
    get_ppg_diff(offense, mip, years[yc])
    yc+=1
print(pureOffensive1820)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
                           ppg  assists  offensive rebounds  fg percentage
Aaron Gordon              -1.6      0.0                 0.0         -0.012
Aaron Holiday              3.6      1.7                 0.2          0.013
Abdel Nader                2.3      0.4                 0.1          0.045
Al Horford                -1.7     -0.2                -0.3         -0.085
Al-Farouq Aminu           -5.1     -0.1                -0.1         -0.142
Alec Burks                 6.2      0.9                 0.2          0.013
Alex Caruso               -3.7     -1.2                -0.5         -0.033
Alex Len                  -3.1     -0.2                -0.3          0.061
Alfonzo McKinnie          -0.1      0.0                -0.2         -0.060
Alize Johnson              1.1      0.3                 0.6          0.164
Allen Crabbe              -5.0     -0.2                -0.1         -0.011
Allonzo Trier             -4.4     -0.7                -0.2          0.033
Amile Jefferson           -1.5     -0.1                 0.1         -0.268
Andre Drummond             0.4      1.3                -1.0          0.000
Andre Iguodala            -1.1     -0.8                 0.1         -0.068
Andrew Wiggins             3.7      1.2                 0.1          0.035
Anfernee Simons            4.5      0.7                 0.2         -0.045
Ante Žižić                -3.4     -0.6                -1.0          0.016
Anthony Davis              0.2     -0.7                -0.8         -0.014
Anthony Tolliver          -1.4      0.1                 0.4         -0.025
Aron Baynes                5.9      0.5                 0.0          0.009
Austin Rivers              0.7     -0.5                 0.1          0.015
Avery Bradley             -1.3     -1.1                -0.3          0.036
B.J. Johnson              -0.3      0.3                 0.0         -0.219
Bam Adebayo                7.0      2.9                 0.4         -0.019
Ben McLemore               6.2      0.6                 0.1          0.053
Ben Simmons               -0.5      0.3                -0.2          0.017
Bismack Biyombo            3.0      0.3                 0.8         -0.028
Blake Griffin             -9.0     -2.1                -0.4         -0.110
Boban Marjanović          -0.7     -0.4                 0.0         -0.042
Bobby Portis              -4.1      0.1                -1.0          0.006
Bogdan Bogdanović          1.0     -0.4                -0.2          0.022
Bojan Bogdanović           2.2      0.1                 0.2         -0.050
Brad Wanamaker             3.0      0.9                 0.2         -0.028
Bradley Beal               4.9      0.6                -0.2         -0.020
Brandon Goodwin            4.7      0.6                 0.2          0.139
Brandon Ingram             5.5      1.2                 0.0         -0.034
Brandon Knight             0.5      1.0                -0.1         -0.028
Brook Lopez               -0.5      0.3                 0.5         -0.017
Bruce Brown                4.6      2.8                 0.5          0.045
Bruno Caboclo             -5.3     -1.1                -0.4          0.000
Bryn Forbes               -0.6     -0.4                 0.0         -0.039
Buddy Hield               -1.5      0.5                -0.5         -0.029
C.J. Miles                 0.0      0.5                 0.2         -0.038
CJ McCollum                1.2      1.4                -0.2         -0.008
Caleb Swanigan             0.4      0.6                 0.2          0.256
Cameron Payne              4.6      0.3                 0.2          0.055
Caris LeVert               5.0      0.5                 0.2         -0.004
Carmelo Anthony            2.0      1.0                 0.3          0.025
Cedi Osman                -2.0     -0.2                 0.0          0.010
Chandler Hutchison         2.6      0.1                -0.1         -0.002
Chandler Parsons          -4.7     -1.1                 0.0         -0.096
Chasson Randle            -3.8     -0.3                -0.2         -0.419
Cheick Diallo             -1.3      0.0                -0.6          0.028
Chimezie Metu              1.4      0.2                 0.4          0.243
Chris Boucher              3.3      0.3                 1.1          0.025
Chris Chiozza              4.2      2.4                 0.2          0.143
Chris Paul                 2.0     -1.5                -0.2          0.070
Christian Wood             4.9      0.6                 0.9          0.046
Clint Capela              -2.7     -0.2                -0.1         -0.019
Cody Zeller                1.0     -0.6                 0.6         -0.027
Collin Sexton              4.1      0.0                 0.2          0.042
Corey Brewer              -3.9     -0.9                -0.4          0.069
Cory Joseph               -0.1     -0.4                 0.1          0.003
Courtney Lee               0.5     -0.6                 0.0          0.077
Cristiano Felício         -0.1      0.1                 1.2          0.099
D'Angelo Russell           2.0     -0.7                -0.3         -0.008
D.J. Augustin             -1.2     -0.7                -0.1         -0.071
D.J. Wilson               -2.2     -0.4                -0.6         -0.020
Damian Jones               0.2     -0.6                 0.0         -0.036
Damian Lillard             4.2      1.1                -0.4          0.019
Damion Lee                 7.8      2.3                 0.4         -0.024
Damyean Dotson            -4.0     -0.6                -0.3         -0.001
Daniel Theis               3.5      0.7                 0.9          0.017
Danilo Gallinari          -1.1     -0.7                -0.3         -0.025
Danny Green               -2.3     -0.3                 0.0         -0.049
Dante Exum                -2.4     -1.5                -0.1          0.052
Danuel House Jr.           1.1      0.3                 0.3         -0.041
Dario Šarić                0.1      0.3                -0.1          0.039
Daryl Macon               -2.8     -0.6                -0.3         -0.037
David Nwaba               -1.3     -0.7                -0.4          0.040
De'Aaron Fox               3.8     -0.5                 0.2          0.022
De'Anthony Melton          2.6     -0.3                 0.2          0.010
DeAndre Jordan            -2.7     -0.4                -0.8          0.025
DeAndre' Bembry           -2.6     -0.6                 0.1          0.010
DeMar DeRozan              0.9     -0.6                -0.1          0.050
DeMarre Carroll           -7.5     -0.3                -0.5         -0.016
Deandre Ayton              1.9      0.1                 0.8         -0.039
Delon Wright              -1.8      0.0                 0.1          0.028
Dennis Schröder            3.4     -0.1                -0.2          0.055
Dennis Smith Jr.          -8.1     -1.9                 0.0         -0.087
Deonte Burton              0.1      0.1                 0.1         -0.058
Derrick Favors            -2.8      0.4                 0.5          0.031
Derrick Jones Jr.          1.5      0.5                -0.5          0.033
Derrick Rose               0.1      1.3                -0.1          0.008
Derrick White              1.4     -0.4                 0.0         -0.021
Devin Booker               0.0     -0.3                -0.2          0.022
Devonte' Graham           13.5      4.9                 0.5          0.039
Dewayne Dedmon            -5.0     -0.9                -0.2         -0.092
Dillon Brooks              8.7      1.2                 0.4          0.005
Dion Waiters              -0.9     -0.8                 0.0          0.000
Domantas Sabonis           4.4      2.1                 0.5         -0.050
Donovan Mitchell           0.2      0.1                 0.0          0.017
Donte DiVincenzo           4.3      1.2                 0.4          0.052
Dorian Finney-Smith        2.0      0.4                 0.3          0.034
Doug McDermott             3.0      0.2                 0.2         -0.003
Dragan Bender              1.7      0.6                 0.3         -0.001
Draymond Green             0.6     -0.7                -0.4         -0.056
Drew Eubanks               3.1      0.4                 1.0          0.065
Duncan Robinson           10.2      1.1                 0.0          0.079
Dusty Hannahs              2.0     -2.5                 0.0          0.194
Dwayne Bacon              -1.6      0.2                 0.2         -0.127
Dwight Howard             -5.3      0.3                -0.2          0.106
Dwight Powell             -1.2      0.0                 0.1          0.041
Dāvis Bertāns              7.4      0.4                 0.3         -0.016
Džanan Musa                2.7      0.9                 0.4         -0.037
E'Twaun Moore             -3.6     -0.5                -0.1         -0.055
Ed Davis                  -4.0     -0.4                -1.4         -0.138
Edmond Sumner              2.0      1.4                 0.0          0.086
Elfrid Payton             -0.6     -0.4                 0.0          0.005
Elie Okobo                -1.7     -0.3                 0.1          0.005
Emmanuel Mudiay           -7.5     -1.8                -0.3          0.016
Enes Freedom              -5.6     -0.7                -1.0          0.023
Eric Bledsoe              -1.0     -0.1                -0.4         -0.009
Eric Gordon               -1.8     -0.4                 0.0         -0.040
Ersan İlyasova            -0.2      0.0                -0.4          0.028
Evan Fournier              3.4     -0.4                -0.2          0.029
Evan Turner               -3.5     -1.9                -0.1         -0.087
Frank Jackson             -1.8     -0.1                 0.0         -0.029
Frank Kaminsky             1.1      0.6                 0.1         -0.013
Frank Mason III            1.8      1.0                 0.4          0.031
Frank Ntilikina            0.6      0.2                 0.1          0.056
Fred VanVleet              6.6      1.8                 0.0          0.003
Furkan Korkmaz             4.0      0.0                 0.0          0.030
Garrett Temple             2.5      1.1                 0.1         -0.044
Gary Clark                 0.8      0.0                 0.4          0.075
Gary Harris               -2.5     -0.1                -0.2         -0.004
Gary Payton II             0.2      0.4                 0.6         -0.211
Gary Trent Jr.             6.2      0.7                 0.3          0.124
George Hill                1.8      0.8                 0.1          0.064
Georges Niang              1.9      0.1                 0.0         -0.037
Giannis Antetokounmpo      1.8     -0.3                 0.0         -0.025
Glenn Robinson III         7.5      1.1                 0.9          0.066
Goran Dragić               2.5      0.3                -0.1          0.028
Gordon Hayward             6.0      0.7                 0.4          0.034
Gorgui Dieng               1.0      0.3                 0.3         -0.045
Grayson Allen              3.1      0.7                 0.1          0.090
Hamidou Diallo             3.2      0.5                 0.3         -0.009
Harrison Barnes           -1.9      0.7                 0.4          0.040
Harry Giles               -0.1     -0.2                -0.2          0.051
Hassan Whiteside           3.2      0.4                 0.3          0.050
Henry Ellenson            -5.6     -0.6                 0.1         -0.268
Ian Mahinmi                3.3      0.6                 0.7          0.043
Iman Shumpert             -3.3     -0.9                 0.3         -0.046
Isaac Bonga                4.1      0.5                 0.7          0.352
Isaiah Hartenstein         2.8      0.3                 0.5          0.169
Isaiah Thomas              4.1      1.8                -0.1          0.065
Ish Smith                  2.0      1.3                 0.0          0.028
Ivica Zubac               -0.6      0.0                 0.8          0.054
J.J. Barea                -3.2     -1.7                 0.0         -0.007
J.R. Smith                -3.9     -1.4                 0.0         -0.024
JJ Redick                 -2.8     -0.7                -0.1          0.013
JaKarr Sampson           -15.4     -0.4                -0.7          0.054
JaMychal Green            -2.6      0.0                -0.4         -0.054
JaVale McGee              -5.4     -0.2                -0.8          0.013
Jabari Parker             -0.5     -0.6                 0.4          0.017
Jacob Evans                3.1      0.3                 0.1         -0.004
Jae Crowder               -1.4      0.8                 0.0          0.002
Jahlil Okafor             -0.1      0.5                 0.2          0.037
Jake Layman                1.5      0.0                -0.1         -0.056
Jakob Poeltl               0.1      0.6                -0.3         -0.021
Jalen Brunson             -1.1      0.1                 0.1         -0.001
Jamal Crawford            -2.9     -0.6                -0.1          0.103
Jamal Murray               0.3      0.0                -0.1          0.019
James Ennis III           -0.1      0.2                 0.0         -0.023
James Harden              -1.8      0.0                 0.2          0.002
James Johnson              0.6     -0.2                 0.4          0.046
Jared Dudley              -3.4     -0.8                -0.5         -0.023
Jaren Jackson Jr.          3.6      0.3                -0.3         -0.037
Jarred Vanderbilt         -0.3      0.0                -0.1          0.151
Jarrett Allen              0.2      0.2                 0.7          0.059
Jaylen Brown               7.3      0.7                 0.2          0.016
Jayson Tatum               7.7      0.9                 0.1          0.000
Jeff Green                -2.9     -0.8                -0.2         -0.013
Jeff Teague               -1.2     -3.0                 0.1          0.013
Jerami Grant              -1.6      0.2                -0.4         -0.019
Jeremy Lamb               -2.8     -0.1                -0.3          0.011
Jerian Grant               0.3     -1.1                 0.0         -0.048
Jerome Robinson            1.7      0.8                 0.1         -0.029
Jevon Carter               0.5     -0.4                 0.1          0.113
Jimmy Butler               1.2      2.0                -0.1         -0.007
Joakim Noah               -4.3     -0.7                -0.4         -0.016
Joe Chealey               -1.5     -0.7                 0.0         -0.333
Joe Harris                 0.8     -0.3                 0.2         -0.014
Joe Ingles                -2.3     -0.5                 0.0         -0.003
Joel Embiid               -4.5     -0.7                 0.3         -0.007
John Collins               2.1     -0.5                -0.8          0.023
John Henson               -0.1      0.4                 0.1          0.093
Johnathan Motley          -2.4      0.1                -0.6          0.199
Johnathan Williams        -3.5      0.0                -0.4         -0.032
Jonah Bolden              -3.3     -0.9                -0.7         -0.130
Jonas Valančiūnas         -0.7      0.5                 0.8          0.026
Jonathan Isaac             2.3      0.3                 0.4          0.041
Jordan Bell               -0.1     -0.5                 0.2          0.006
Jordan Clarkson           -1.6     -0.5                -0.3          0.006
Jordan McRae               5.6      1.4                 0.3         -0.062
Josh Hart                  2.3      0.3                 0.4          0.016
Josh Jackson              -2.5     -0.7                -0.3          0.027
Josh Okogie                0.9      0.4                 0.8          0.041
Josh Richardson           -2.9     -1.2                 0.0          0.018
Jrue Holiday              -2.1     -1.0                 0.2         -0.017
Juancho Hernangómez        0.2      0.0                 0.0         -0.034
Julius Randle             -1.9      0.0                 0.2         -0.064
Justin Anderson           -0.9      0.3                -0.4         -0.145
Justin Holiday            -2.2     -0.5                -0.2          0.042
Justin Jackson            -1.7     -0.4                -0.1         -0.051
Justin Patton              0.1     -0.6                -0.5          0.114
Justise Winslow           -1.3     -0.3                 0.5         -0.045
Jusuf Nurkić               2.0      0.8                -0.5         -0.013
Kadeem Allen              -4.9     -1.9                -0.2         -0.029
Karl-Anthony Towns         2.1      1.0                -0.7         -0.010
Kawhi Leonard              0.5      1.6                -0.4         -0.026
Keita Bates-Diop           1.5      0.1                 0.1          0.004
Kelly Olynyk              -1.8     -0.1                -0.2         -0.001
Kelly Oubre Jr.            3.5      0.3                 0.2          0.007
Kemba Walker              -5.2     -1.1                 0.0         -0.009
Kenrich Williams          -2.6     -0.3                 0.1         -0.037
Kent Bazemore             -2.8     -0.9                -0.2         -0.027
Kentavious Caldwell-Pope  -2.1      0.3                 0.0          0.037
Kevin Huerter              2.5      0.9                -0.2         -0.006
Kevin Knox                -6.4     -0.2                -0.4         -0.011
Kevin Love                 0.6      1.0                -0.5          0.065
Kevon Looney              -2.9     -0.5                -1.0         -0.258
Khem Birch                -0.4      0.2                 0.3         -0.093
Khris Middleton            2.6      0.0                 0.1          0.056
Khyri Thomas              -0.2      0.1                -0.1         -0.025
Kostas Antetokounmpo       0.4      0.4                 0.4          1.000
Kris Dunn                 -4.0     -2.6                 0.1          0.019
Kyle Anderson             -2.2     -0.6                -0.2         -0.069
Kyle Korver               -1.9      0.0                 0.2          0.014
Kyle Kuzma                -5.9     -1.2                 0.0         -0.020
Kyle Lowry                 5.2     -1.2                 0.0          0.005
Kyle O'Quinn               0.0      0.6                 0.6         -0.013
Kyrie Irving               3.6     -0.5                 0.0         -0.009
LaMarcus Aldridge         -2.4      0.0                -1.2         -0.026
Lance Thomas              -1.1      0.3                -0.2         -0.048
Landry Shamet              0.2      0.4                -0.2         -0.027
Langston Galloway          1.9      0.4                -0.1          0.047
Larry Nance Jr.            0.7     -1.0                -0.6          0.011
Lauri Markkanen           -4.0      0.1                -0.2         -0.005
LeBron James              -2.1      1.9                 0.0         -0.017
Lonnie Walker IV           3.8      0.6                 0.4          0.078
Lonzo Ball                 1.9      1.6                 0.0         -0.003
Lou Williams              -1.8      0.2                 0.0         -0.007
Luc Mbah a Moute          -3.3     -0.5                -0.2         -0.044
Luka Dončić                7.6      2.8                 0.1          0.036
Luke Kennard               6.1      2.3                 0.1          0.004
Luke Kornet               -1.0     -0.3                 0.0          0.061
Malcolm Brogdon            0.9      3.9                -0.1         -0.067
Malcolm Miller            -2.2      0.3                 0.0         -0.009
Malik Beasley             -0.1      0.2                -0.1         -0.049
Malik Monk                 1.4      0.5                 0.3          0.047
Marc Gasol                -6.1     -1.1                -0.3         -0.021
Marco Belinelli           -4.2     -0.5                -0.1         -0.021
Marcus Morris              2.8     -0.1                 0.0         -0.009
Marcus Smart               4.0      0.9                 0.0         -0.047
Mario Hezonja             -4.0     -0.6                 0.1          0.010
Markelle Fultz             3.9      2.0                -0.8          0.046
Markieff Morris            0.3     -0.1                -0.4          0.022
Marquese Chriss            5.1      1.4                 1.0          0.173
Marvin Bagley III         -0.7     -0.2                -0.4         -0.037
Marvin Williams           -4.2     -0.2                -0.5          0.024
Mason Plumlee             -0.6     -0.5                -0.4          0.022
Matthew Dellavedova       -2.8     -0.6                 0.2         -0.051
Maurice Harkless          -1.9     -0.1                -0.4          0.015
Maxi Kleber                2.3      0.2                 0.2          0.008
Melvin Frazier             0.6      0.1                -0.2          0.108
Meyers Leonard             0.2     -0.1                -0.2         -0.036
Michael Carter-Williams    2.4     -0.1                 0.3          0.053
Michael Kidd-Gilchrist    -4.3     -0.4                -0.8         -0.143
Mikal Bridges              0.8     -0.3                 0.2          0.080
Mike Conley               -6.7     -2.0                 0.1         -0.029
Mike Muscala              -2.2     -0.3                -0.6          0.005
Mike Scott                 0.2      0.0                 0.5          0.026
Miles Bridges              5.5      0.6                 0.6         -0.040
Mitchell Robinson          2.4      0.0                 0.3          0.048
Mo Bamba                  -0.8     -0.1                 0.2         -0.019
Monte Morris              -1.4     -0.1                -0.1         -0.034
Montrezl Harrell           2.0     -0.3                 0.4         -0.035
Moritz Wagner              3.9      0.6                 0.8          0.130
Myles Turner              -1.2     -0.4                 0.0         -0.030
Naz Mitrou-Long            1.7      0.5                -0.1          0.053
Nemanja Bjelica            1.9      0.9                -0.1          0.002
Nerlens Noel               2.5      0.3                -0.1          0.097
Nicolas Batum             -5.7     -0.3                 0.2         -0.104
Nikola Jokić              -0.2     -0.3                -0.6          0.017
Nikola Vučević            -1.2     -0.2                -0.5         -0.041
Noah Vonleh               -4.7     -1.1                -0.7          0.095
Norman Powell              7.4      0.3                 0.2          0.012
OG Anunoby                 3.6      0.9                 0.3          0.052
Omari Spellman             1.7      0.0                 0.0          0.029
Otto Porter Jr.           -2.0     -0.3                -0.1         -0.022
P.J. Tucker               -0.4      0.4                 0.1          0.019
PJ Dozier                  2.6      1.4                -0.7          0.033
Pascal Siakam              6.0      0.4                -0.5         -0.096
Pat Connaughton           -1.5     -0.4                -0.1         -0.011
Patrick Beverley           0.3     -0.2                 0.1          0.024
Patrick McCaw              2.0      1.1                 0.3          0.001
Patrick Patterson          1.3      0.2                -0.1          0.034
Patty Mills                1.7     -1.2                 0.0          0.006
Paul George               -6.5     -0.2                -0.9          0.001
Paul Millsap              -1.0     -0.4                -0.3         -0.002
Quinn Cook                -1.8     -0.5                 0.0         -0.040
Rajon Rondo               -2.1     -3.0                -0.2          0.013
Raul Neto                 -0.2     -0.7                 0.0         -0.005
Reggie Bullock            -3.2     -0.6                 0.1         -0.010
Reggie Jackson            -3.5     -0.1                 0.0         -0.010
Richaun Holmes             4.1      0.1                 1.3          0.040
Ricky Rubio                0.3      2.7                 0.2          0.011
Robert Covington          -0.9      0.0                 0.1         -0.009
Robert Williams            2.7      0.7                 0.6          0.021
Robin Lopez               -4.1     -0.5                -1.1         -0.076
Rodions Kurucs            -3.9      0.3                -0.4         -0.004
Rodney Hood               -0.2     -0.3                 0.2          0.071
Rodney McGruder           -4.3     -1.1                -0.4         -0.005
Rondae Hollis-Jefferson   -1.9      0.2                 0.4          0.060
Royce O'Neale              1.1      1.0                 0.1         -0.042
Rudy Gay                  -2.9     -0.9                 0.0         -0.058
Rudy Gobert               -0.8     -0.5                -0.4          0.024
Russell Westbrook          4.3     -3.7                 0.3          0.044
Ryan Anderson              0.0      0.2                -0.7         -0.018
Ryan Arcidiacono          -2.2     -1.6                 0.0         -0.038
Ryan Broekhoff             0.2      0.1                 0.1         -0.079
Semi Ojeleye               0.1      0.1                 0.0         -0.016
Serge Ibaka                0.4      0.1                 0.0         -0.017
Seth Curry                 4.5      1.0                 0.0          0.039
Shabazz Napier             0.9      2.1                 0.2          0.023
Shai Gilgeous-Alexander    8.2      0.0                 0.0         -0.005
Shake Milton               5.0      1.7                -0.1          0.093
Shaquille Harrison        -1.6     -0.8                 0.0          0.035
Sindarius Thornwell        7.0      1.7                -0.1          0.198
Skal Labissière            2.8      0.8                 1.6          0.020
Solomon Hill               1.2      0.5                -0.3          0.014
Spencer Dinwiddie          3.8      2.2                 0.1         -0.027
Stanley Johnson           -4.5     -0.5                -0.2         -0.016
Stephen Curry             -6.5      1.4                 0.1         -0.070
Sterling Brown            -1.3     -0.4                 0.1         -0.094
Steven Adams              -3.0      0.7                -1.6         -0.003
Svi Mykhailiuk             5.8      1.0                 0.1          0.081
T.J. Leaf                 -0.9     -0.1                 0.1         -0.122
T.J. McConnell             0.1      1.6                 0.1         -0.009
T.J. Warren                1.8      0.0                 0.3          0.050
Taj Gibson                -4.7     -0.4                -0.7          0.018
Taurean Prince            -1.4     -0.3                 0.4         -0.065
Terrance Ferguson         -3.0     -0.1                 0.0         -0.074
Terrence Ross             -0.4     -0.5                -0.1         -0.025
Terry Rozier               9.0      1.2                 0.4          0.036
Thabo Sefolosha           -1.6      0.1                 0.3         -0.070
Thaddeus Young            -2.3     -0.7                -0.9         -0.079
Theo Pinson               -0.9      0.5                 0.1         -0.052
Thomas Bryant              2.7      0.5                 0.5         -0.035
Thon Maker                -0.3      0.0                 0.3          0.075
Tim Frazier               -1.7     -0.8                -0.4         -0.082
Tim Hardaway Jr.          -2.3     -0.5                -0.1          0.041
Timothé Luwawu-Cabarrot    3.2      0.1                 0.4          0.059
Tobias Harris             -0.4      0.4                 0.2         -0.016
Tomáš Satoranský           1.0      0.4                 0.2         -0.055
Tony Bradley              -0.8      0.1                -1.1          0.167
Tony Snell                 2.0      1.3                -0.2         -0.007
Torrey Craig              -0.3     -0.2                -0.1          0.019
Trae Young                10.5      1.2                -0.3          0.019
Treveon Graham            -0.9     -0.2                 0.2          0.025
Trevor Ariza              -4.5     -2.0                -0.1          0.039
Trey Burke                -3.5     -0.2                 0.0          0.019
Trey Lyles                -2.1     -0.3                 0.4          0.028
Tristan Thompson           1.1      0.1                 0.0         -0.017
Troy Brown Jr.             5.6      1.1                 0.4          0.024
Troy Daniels              -1.9     -0.1                 0.0         -0.024
Tyler Johnson             -3.9     -1.0                -0.1         -0.025
Tyler Zeller              -6.7     -0.7                -0.3         -0.283
Tyrone Wallace            -0.6      0.2                -0.1         -0.106
Tyson Chandler            -1.8     -0.5                -0.7          0.162
Tyus Jones                 0.5     -0.4                -0.2          0.044
Udonis Haslem              0.5      0.1                 0.0          0.031
Victor Oladipo            -4.3     -2.3                -0.1         -0.029
Vince Carter*             -2.4     -0.3                -0.1         -0.067
Wayne Ellington           -5.2     -0.2                -0.2         -0.052
Wendell Carter Jr.         1.0     -0.6                 1.2          0.049
Wes Iwundu                 0.8      0.1                 0.0          0.004
Wesley Matthews           -4.8     -0.9                -0.2         -0.004
Will Barton                3.6      0.8                 0.6          0.048
Willie Cauley-Stein       -4.7     -1.1                -0.6          0.023
Willy Hernangómez         -1.2     -0.1                -0.6          0.013
Wilson Chandler           -0.1     -0.5                -0.6         -0.014
Yogi Ferrell              -1.5     -0.5                -0.1         -0.015
Yuta Watanabe             -0.6     -0.2                 0.1          0.147
Zach Collins               0.4      0.6                 0.9         -0.002
Zach LaVine                1.8     -0.3                 0.1         -0.017
Zhaire Smith              -5.6     -1.4                -0.5         -0.139

Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of Assists Per Game (APG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the assist difference, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in Assists per game is highlighted in green and the most improved player is highlighted in red.

In [44]:
def get_assists_diff(offense, mip, year):

    offense.dropna(inplace=True)

    # Sort the DataFrame by the difference in PPG
    sorted_growth_df = offense.sort_values(by='assists', ascending=False)

    # Plotting the difference in PPG for all players
    plt.figure(figsize=(10, 6))
    plt.barh(sorted_growth_df.index, sorted_growth_df['assists'], color='skyblue')
    plt.title(f'Difference in Assists Per Game (APG) in {year} from last season')
    plt.xlabel('Difference in APG')
    plt.ylabel('Players')

    # Highlighting the player with the highest difference
    highest_difference_player = sorted_growth_df.index[0]
    highest_difference = sorted_growth_df.loc[highest_difference_player, 'assists']
    plt.barh(highest_difference_player, highest_difference, color='green', label='Highest APG growth Player')

    # Annotating the highest difference
    plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')

    # Highlighting the award player
    if mip in offense.index:
        award_player_difference = offense.loc[mip, 'assists']
        plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
        plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')


    plt.legend(loc= 'lower left')
    plt.tight_layout()
    plt.show()

oo =0

for offense in allOffense:
    mip = pastMipWinners[oo]
    get_assists_diff(offense, mip, years[oo])
    oo+=1
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of offensive rebounds per game (OR), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in offensive rebounds per game is highlighted in green and the most improved player is highlighted in red.

In [43]:
def get_or_diff(offense, mip, year):

    offense.dropna(inplace=True)

    # Sort the DataFrame by the difference in PPG
    sorted_growth_df = offense.sort_values(by='offensive rebounds', ascending=False)

    # Plotting the difference in PPG for all players
    plt.figure(figsize=(10, 6))
    plt.barh(sorted_growth_df.index, sorted_growth_df['offensive rebounds'], color='skyblue')
    plt.title(f'Difference in Offensive rebounds Per Game (OR) in {year} from last season')
    plt.xlabel('Difference in OR')
    plt.ylabel('Players')

    # Highlighting the player with the highest difference
    highest_difference_player = sorted_growth_df.index[0]
    highest_difference = sorted_growth_df.loc[highest_difference_player, 'offensive rebounds']
    plt.barh(highest_difference_player, highest_difference, color='green', label='Highest Offensive Rebounds growth Player')

    # Annotating the highest difference
    plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')

    # Highlighting the award player
    if mip in offense.index:
        award_player_difference = offense.loc[mip, 'offensive rebounds']
        plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
        plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')

    plt.legend(loc= 'lower left')
    plt.tight_layout()
    plt.show()

yc =0
for offense in allOffense:
    mip = pastMipWinners[yc]
    get_or_diff(offense, mip, years[yc])
    yc+=1
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of field goal percentage (FGP), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating their field goal percentage, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in field goal percentage is highlighted in green and the most improved player is highlighted in red.

In [42]:
def get_fgp_diff(offense, mip, year):

    
    offense.dropna(inplace=True)

    # Sort the DataFrame by the difference in PPG
    sorted_growth_df = offense.sort_values(by='fg percentage', ascending=False)

    # Plotting the difference in PPG for all players
    plt.figure(figsize=(10, 6))
    plt.barh(sorted_growth_df.index, sorted_growth_df['fg percentage'], color='skyblue')
    plt.title(f'Difference in Field Goal Percentage (FPG) Per Game in {year} From Last Season')
    plt.xlabel('Difference in FGP')
    plt.ylabel('Players')

    # Highlighting the player with the highest difference
    highest_difference_player = sorted_growth_df.index[0]
    highest_difference = sorted_growth_df.loc[highest_difference_player, 'fg percentage']
    plt.barh(highest_difference_player, highest_difference, color='green', label='Highest FGP growth Player')

    # Annotating the highest difference
    plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')

    # Highlighting the award player
    if mip in offense.index:
        award_player_difference = offense.loc[mip, 'fg percentage']
        plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
        plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')


    plt.legend(loc= 'lower left')
    plt.tight_layout()
    plt.show()

yc =0
for offense in allOffense:
    mip = pastMipWinners[yc]
    get_fgp_diff(offense, mip, years[yc])
    yc+=1
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

This next section will conduct a regression analysis on the player rankings and their statistics for the year. We are going to analyze the relationship between offensive player statistics and player rankings across different basketball seasons, specifically focusing on predicting player rankings based on their offensive performance and evaluating the accuracy of these predictions. We are using the offensive rankings and stats that we have used/found above including points per game, assists per game, offensive rebounds per game, and field goal percentage.

In [87]:
# this obtains the coefficients would show the probability of the correlation between the actual and predicted rank based on difference of each statistic
def regression2(pureOffensive, mipdata,year):
    pureOffensive = pd.merge(pureOffensive, mipdata, left_index=True, right_index=True)
    pureOffensive['Rank'] = pureOffensive['Rank'].astype(int)
    X = pureOffensive[['ppg', 'assists', 'offensive rebounds', 'fg percentage']]
    y = pureOffensive['Rank']
    X = sm.add_constant(X)
    
    X1_train, X1_test, y1_train, y1_test = train_test_split(X, y, test_size=0.6, random_state=42)
    
    # Train the model
    model1 = sm.OLS(y1_train, X1_train).fit()
    print(model1.summary())

    # Make predictions
    y1_pred = model1.predict(X1_test)
    y1_pred = np.maximum(y1_pred,1)

    # Calculate evaluation metrics
    mse1 = mean_squared_error(y1_test, y1_pred)
    r21 = r2_score(y1_test, y1_pred)
    print(f'Mean Squared Error: {mse1}')
    print(f'R-squared: {r21}')

    # Plot the actual vs predicted values
    plt.figure(figsize=(10, 6))
    plt.scatter(y1_test, y1_pred, color='skyblue')
    plt.plot([min(y1_test), max(y1_test)], [min(y1_test), max(y1_test)], color='red', linewidth=2)
    plt.title(f'Actual vs Predicted Ranks {year} Season')
    plt.xlabel('Actual Rank')
    plt.ylabel('Predicted Rank')
    plt.show()
    
listMip = [mipdata1920,mipdata2021,mipdata2122,mipdata2223,mipdata2324]
for i,j in enumerate(allOffense):
    regression2(j, listMip[i], years[i])
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       0.979
Model:                            OLS   Adj. R-squared:                  0.896
Method:                 Least Squares   F-statistic:                     11.78
Date:                Fri, 17 May 2024   Prob (F-statistic):              0.215
Time:                        16:30:55   Log-Likelihood:                -4.1250
No. Observations:                   6   AIC:                             18.25
Df Residuals:                       1   BIC:                             17.21
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 20.1963      3.308      6.105      0.103     -21.835      62.227
ppg                   -1.6348      0.478     -3.420      0.181      -7.708       4.439
assists               -0.3771      0.385     -0.979      0.507      -5.273       4.519
offensive rebounds     7.4167      3.332      2.226      0.269     -34.926      49.759
fg percentage        133.5644     25.261      5.287      0.119    -187.411     454.540
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.589
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.541
Skew:                           0.651   Prob(JB):                        0.763
Kurtosis:                       2.315   Cond. No.                         480.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 60.39356303772042
R-squared: -2.524408217619131
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 6 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
No description has been provided for this image
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       0.638
Model:                            OLS   Adj. R-squared:                  0.348
Method:                 Least Squares   F-statistic:                     2.200
Date:                Fri, 17 May 2024   Prob (F-statistic):              0.205
Time:                        16:30:55   Log-Likelihood:                -28.556
No. Observations:                  10   AIC:                             67.11
Df Residuals:                       5   BIC:                             68.63
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 19.9431      5.654      3.527      0.017       5.408      34.478
ppg                   -1.9155      0.832     -2.302      0.070      -4.054       0.223
assists                0.4932      2.598      0.190      0.857      -6.185       7.172
offensive rebounds     5.8328      6.788      0.859      0.429     -11.617      23.283
fg percentage         15.3430     93.296      0.164      0.876    -224.482     255.167
==============================================================================
Omnibus:                        0.225   Durbin-Watson:                   2.319
Prob(Omnibus):                  0.894   Jarque-Bera (JB):                0.388
Skew:                           0.186   Prob(JB):                        0.824
Kurtosis:                       2.110   Cond. No.                         289.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 39.694532862511245
R-squared: -0.4419228114409153
/opt/homebrew/lib/python3.11/site-packages/scipy/stats/_stats_py.py:1971: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
  k, _ = kurtosistest(a, axis)
No description has been provided for this image
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 4 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
  return np.dot(wresid, wresid) / self.df_resid
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                       nan
Date:                Fri, 17 May 2024   Prob (F-statistic):                nan
Time:                        16:30:55   Log-Likelihood:                 120.86
No. Observations:                   4   AIC:                            -233.7
Df Residuals:                       0   BIC:                            -236.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 10.4852        inf          0        nan         nan         nan
ppg                   -0.2691        inf         -0        nan         nan         nan
assists               -1.4575        inf         -0        nan         nan         nan
offensive rebounds   -16.2889        inf         -0        nan         nan         nan
fg percentage         -2.8961        inf         -0        nan         nan         nan
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.017
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.637
Skew:                           0.035   Prob(JB):                        0.727
Kurtosis:                       1.046   Cond. No.                         70.6
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
Mean Squared Error: 28.22721138842799
R-squared: -1.5807736126562735
No description has been provided for this image
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
  return np.dot(wresid, wresid) / self.df_resid
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                       nan
Date:                Fri, 17 May 2024   Prob (F-statistic):                nan
Time:                        16:30:55   Log-Likelihood:                 158.82
No. Observations:                   5   AIC:                            -307.6
Df Residuals:                       0   BIC:                            -309.6
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                  8.7721        inf          0        nan         nan         nan
ppg                   -1.2995        inf         -0        nan         nan         nan
assists                3.4118        inf          0        nan         nan         nan
offensive rebounds     1.0407        inf          0        nan         nan         nan
fg percentage         58.7885        inf          0        nan         nan         nan
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.429
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.524
Skew:                           0.481   Prob(JB):                        0.770
Kurtosis:                       1.739   Cond. No.                         380.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 11.009424092482377
R-squared: -1.2087872787425455
No description has been provided for this image
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
  return np.dot(wresid, wresid) / self.df_resid
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                       nan
Date:                Fri, 17 May 2024   Prob (F-statistic):                nan
Time:                        16:30:55   Log-Likelihood:                 160.01
No. Observations:                   5   AIC:                            -310.0
Df Residuals:                       0   BIC:                            -312.0
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 10.9733        inf          0        nan         nan         nan
ppg                   -3.7525        inf         -0        nan         nan         nan
assists               10.6587        inf          0        nan         nan         nan
offensive rebounds     3.7924        inf          0        nan         nan         nan
fg percentage         67.0659        inf          0        nan         nan         nan
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.038
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.721
Skew:                          -0.492   Prob(JB):                        0.697
Kurtosis:                       1.422   Cond. No.                         313.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 69.60874222280657
R-squared: -3.097607645383235
No description has been provided for this image

While analyzing the regressions of the the offensive stats it is important to note that due to small sample size within a given season with varying ranks and numbers of players on that years Most Improved Player ladder, regression may not have an ample amount of samples to do an effective regression model. For the 19/20 Most Improved player table, it indicated a p-value of 0.215 meaning that the model was not statistically significant. While offensive rebounds was the only parameter with a positive coefficient suggesting it would indicate a higher rank, all of the parameters were statistically insignificant. For the 20/21 season, the points per game negative correlation with rank was nearly statistically significant with a p-value of 0.070 as 0.05 or below is needed for signicance.

Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2019-2020 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [381]:
def corrMOffensive1820(pureOffensive1820,mipdata1920):
    pureOffensive1820 = pd.merge(pureOffensive1820, mipdata1920, left_index=True, right_index=True)

    correlation_matrix = pureOffensive1820.corr()

    # Plot the correlation matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2018-2020 Seasons for MIP')
    plt.show()
corrMOffensive1820(pureOffensive1820,mipdata1920)

# direct positive correlations are redder and negative correlations are bluer 
No description has been provided for this image

Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2020-2021 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [383]:
def corrMOffensive1921(pureOffensive1921,mipdata2021):   
    pureOffensive1921 = pd.merge(pureOffensive1921, mipdata2021, left_index=True, right_index=True)
    correlation_matrix1 = pureOffensive1921.corr()
    # Plot the correlation matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix1, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2019-2021 Seasons for MIP')
    plt.show()
corrMOffensive1921(pureOffensive1921,mipdata2021)
No description has been provided for this image

Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2021-2022 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [386]:
def corrMOffensive2022(pureOffensive2022,mipdata2122):   
    pureOffensive2022 = pd.merge(pureOffensive2022, mipdata2122, left_index=True, right_index=True)
    correlation_matrix2 = pureOffensive2022.corr()
    # Plot the correlation matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix2, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2020-2022 Seasons for MIP')
    plt.show()
corrMOffensive2022(pureOffensive2022,mipdata2122)
No description has been provided for this image

Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2022-2023 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [387]:
def corrMOffensive2123(pureOffensive2123,mipdata2223):   

    pureOffensive2123 = pd.merge(pureOffensive2123, mipdata2223, left_index=True, right_index=True)

    correlation_matrix3 = pureOffensive2123.corr()
    # Plot the correlation matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix3, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2021-2023 Seasons for MIP')
    plt.show()
corrMOffensive2123(pureOffensive2123,mipdata2223)  
No description has been provided for this image

Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2023-2024 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [389]:
def corrMOffensive2224(pureOffensive2224,mipdata2324):   

    pureOffensive2224 = pd.merge(pureOffensive2224, mipdata2324, left_index=True, right_index=True)
    correlation_matrix4 = pureOffensive2224.corr()
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2022-2024 Seasons for MIP')
    plt.show()
corrMOffensive2224(pureOffensive2224,mipdata2324)   
No description has been provided for this image

Defensive stats

Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Blocks Per Game (BPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in blocks, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in blocks per game is highlighted in green and the most improved player is highlighted in red.

In [90]:
# Store the blocks, steals and defensive rebounds in a data frame and remove null string rows from data frame to get difference in all active players
pureDefensive1819 = data1819[['blocks','steals','defensive rebounds']].replace('', np.nan) 
pureDefensive1920 = data1920[['blocks','steals','defensive rebounds']].replace('', np.nan)
pureDefensive2021 = data2021[['blocks','steals','defensive rebounds']].replace('', np.nan) 
pureDefensive2122 = data2122[['blocks','steals','defensive rebounds']].replace('', np.nan)
pureDefensive2223 = data2223[['blocks','steals','defensive rebounds']].replace('', np.nan) 
pureDefensive2324 = data2324[['blocks','steals','defensive rebounds']].replace('', np.nan)

# # Convert each of the stats to floats for arithmetics subtraction over seasons
pureDefensive1819 = pureDefensive1819.astype(float) # 1920 season
pureDefensive1920 = pureDefensive1920.astype(float)
pureDefensive2021 = pureDefensive2021.astype(float)
pureDefensive2122 = pureDefensive2122.astype(float)
pureDefensive2223 = pureDefensive2223.astype(float)
pureDefensive2324 = pureDefensive2324.astype(float)
# Subtracts from the 19/20 season data so growth and decline is noted in the data frame
pureDefensive1820 = pureDefensive1920.sub(pureDefensive1819)
pureDefensive1921 = pureDefensive2021.sub(pureDefensive1920)
pureDefensive2022 = pureDefensive2122.sub(pureDefensive2021)
pureDefensive2123 = pureDefensive2223.sub(pureDefensive2122)
pureDefensive2224 = pureDefensive2324.sub(pureDefensive2223)

allDefense = [pureDefensive1820, pureDefensive1921, pureDefensive2022, pureDefensive2123, pureDefensive2224]
yc = 0
years = ["19-20", "20-21", "21-22", "22-23", "23-24"]

def get_blocks_diff(defense, mip, year):

    
    defense.dropna(inplace=True)

    # Sort the DataFrame by the difference in PPG
    sorted_growth_df = defense.sort_values(by='blocks', ascending=False)

    # Plotting the difference in PPG for all players
    plt.figure(figsize=(10, 6))
    plt.barh(sorted_growth_df.index, sorted_growth_df['blocks'], color='skyblue')
    plt.title(f'Difference in Blocks Per Game (BPG) in {year} from last season')
    plt.xlabel('Difference in BPG')
    plt.ylabel('Players')

    # Highlighting the player with the highest difference
    highest_difference_player = sorted_growth_df.index[0]
    highest_difference = sorted_growth_df.loc[highest_difference_player, 'blocks']
    plt.barh(highest_difference_player, highest_difference, color='green', label='Highest BPG Growth Player')

    # Annotating the highest difference
    plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')

    # Highlighting the award player
    if mip in defense.index:
        award_player_difference = defense.loc[mip, 'blocks']
        plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
        # plt.text(award_player_difference, offense.index.get_loc(mip, f'{mip}: {award_player_difference}'), va='center')
        plt.text(award_player_difference, defense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')


    plt.legend(loc= 'lower left')
    plt.tight_layout()
    plt.show()

count = 0
for defense in allDefense:
    mip = pastMipWinners[yc]
    get_blocks_diff(defense, mip, years[count])
    count+=1
# print(pureOffensive1820)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Steals Per Game (SPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in steals, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in steals per game is highlighted in green and the most improved player is highlighted in red.

In [91]:
def get_steals_diff(defense, mip, year):

    defense.dropna(inplace=True)

    # Sort the DataFrame by the difference in steals per game
    sorted_growth_df = defense.sort_values(by='steals', ascending=False)

    # Plotting the difference in steals per game for all players
    plt.figure(figsize=(10, 6))
    plt.barh(sorted_growth_df.index, sorted_growth_df['steals'], color='skyblue')
    plt.title(f'Difference in Steals Per Game (SPG) in {year} from last season')
    plt.xlabel('Difference in SPG')
    plt.ylabel('Players')

    # Highlighting the player with the highest difference
    highest_difference_player = sorted_growth_df.index[0]
    highest_difference = sorted_growth_df.loc[highest_difference_player, 'steals']
    plt.barh(highest_difference_player, highest_difference, color='green', label='Highest SPG Growth Player')

    # Annotating the highest difference
    plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')

    # Highlighting the award player
    if mip in defense.index:
        award_player_difference = defense.loc[mip, 'steals']
        plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
        plt.text(award_player_difference, defense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')

    plt.legend(loc= 'lower left')
    plt.tight_layout()
    plt.show()

count = 0
for defense in allDefense:
    mip = pastMipWinners[count]
    get_steals_diff(defense, mip, years[count])
    count+=1
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Defensive Rebounds Per Game (DRPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in defensive rebounds, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in steals per game is highlighted in green and the most improved player is highlighted in red.

In [41]:
def get_Dr_diff(defense, mip, year):

    # Any player that cannot stay in the league will have NaN values and thus will be dropped from the table
    defense.dropna(inplace=True)

    # Sort the DataFrame by the difference in DRPG
    sorted_growth_df = defense.sort_values(by='defensive rebounds', ascending=False)

    # Plotting the difference in PPG for all players
    plt.figure(figsize=(10, 6))
    plt.barh(sorted_growth_df.index, sorted_growth_df['defensive rebounds'], color='skyblue')
    plt.title(f'Difference in Defensive Rebounds Per Game (DRPG) in {year} from last season')
    plt.xlabel('Difference in DRPG')
    plt.ylabel('Players')

    # Highlighting the player with the highest difference
    highest_difference_player = sorted_growth_df.index[0]
    highest_difference = sorted_growth_df.loc[highest_difference_player, 'defensive rebounds']
    plt.barh(highest_difference_player, highest_difference, color='green', label='Highest DRPG Growth Player')

    # Annotating the highest difference
    plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')

    # Highlighting the award player
    if mip in defense.index:
        award_player_difference = defense.loc[mip, 'defensive rebounds']
        plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
        plt.text(award_player_difference, defense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')

    plt.legend(loc= 'lower left')
    plt.tight_layout()
    plt.show()

count = 0
for defense in allDefense:
    mip = pastMipWinners[count]
    get_Dr_diff(defense, mip, years[count])
    count+=1
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Defensive Rebounds Per Game (DRPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in defensive rebounds, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in defensive rebounds per game is highlighted in green and the most improved player is highlighted in red.

This next section will conduct a regression analysis on the player rankings and their statistics for the year. We going to analyze the relationship between defensive player statistics and player rankings across different basketball seasons, specifically focusing on predicting player rankings based on their defensive performance and evaluating the accuracy of these predictions. We are using the defensive rankings and stats that we have used/found above including blocks per game, steals per game, and defensive rebounds per game.

In [95]:
def regression3(pureDefensive, mipdata,year):
    pureDefensive = pd.merge(pureDefensive, mipdata, left_index=True, right_index=True)
    pureDefensive['Rank'] = pureDefensive['Rank'].astype(int)
    X = pureDefensive[['blocks', 'steals', 'defensive rebounds']]
    y = pureDefensive['Rank']
    X = sm.add_constant(X)
    
    X1_train, X1_test, y1_train, y1_test = train_test_split(X, y, test_size=0.6, random_state=42)
    
    # Train the model
    model1 = sm.OLS(y1_train, X1_train).fit()
    print(model1.summary())

    # Make predictions
    y1_pred = model1.predict(X1_test)
    y1_pred = np.maximum(y1_pred,1)


    # Calculate evaluation metrics
    mse1 = mean_squared_error(y1_test, y1_pred)
    r21 = r2_score(y1_test, y1_pred)
    print(f'Mean Squared Error: {mse1}')
    print(f'R-squared: {r21}')

    # Plot the actual vs predicted values
    plt.figure(figsize=(10, 6))
    plt.scatter(y1_test, y1_pred, color='skyblue')
    plt.plot([min(y1_test), max(y1_test)], [min(y1_test), max(y1_test)], color='red', linewidth=2)
    plt.title(f'Actual vs Predicted Ranks {year} Season')
    plt.xlabel('Actual Rank')
    plt.ylabel('Predicted Rank')
    plt.show()
    
for i,j in enumerate(allDefense):
    regression3(j, listMip[i], years[i])
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       0.326
Model:                            OLS   Adj. R-squared:                 -0.684
Method:                 Least Squares   F-statistic:                    0.3229
Date:                Fri, 17 May 2024   Prob (F-statistic):              0.814
Time:                        16:41:30   Log-Likelihood:                -14.560
No. Observations:                   6   AIC:                             37.12
Df Residuals:                       2   BIC:                             36.29
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 16.9049      8.136      2.078      0.173     -18.103      51.913
blocks                 1.3807     18.161      0.076      0.946     -76.761      79.522
steals                -4.6993      7.919     -0.593      0.613     -38.773      29.374
defensive rebounds    -3.3252      3.910     -0.850      0.485     -20.149      13.499
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.396
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.260
Skew:                          -0.402   Prob(JB):                        0.878
Kurtosis:                       2.374   Cond. No.                         19.9
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 43.6148118407519
R-squared: -1.5452447832139073
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 6 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
No description has been provided for this image
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       0.343
Model:                            OLS   Adj. R-squared:                  0.015
Method:                 Least Squares   F-statistic:                     1.045
Date:                Fri, 17 May 2024   Prob (F-statistic):              0.438
Time:                        16:41:30   Log-Likelihood:                -31.531
No. Observations:                  10   AIC:                             71.06
Df Residuals:                       6   BIC:                             72.27
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 14.2193      3.388      4.197      0.006       5.928      22.510
blocks                -4.9871     15.717     -0.317      0.762     -43.444      33.470
steals                 2.7433      6.080      0.451      0.668     -12.134      17.620
defensive rebounds    -4.3028      2.800     -1.537      0.175     -11.153       2.548
==============================================================================
Omnibus:                        0.088   Durbin-Watson:                   1.907
Prob(Omnibus):                  0.957   Jarque-Bera (JB):                0.313
Skew:                          -0.016   Prob(JB):                        0.855
Kurtosis:                       2.134   Cond. No.                         9.76
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 19.123526843848808
R-squared: 0.3053287794856343
/opt/homebrew/lib/python3.11/site-packages/scipy/stats/_stats_py.py:1971: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
  k, _ = kurtosistest(a, axis)
No description has been provided for this image
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 4 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
  return np.dot(wresid, wresid) / self.df_resid
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                       nan
Date:                Fri, 17 May 2024   Prob (F-statistic):                nan
Time:                        16:41:30   Log-Likelihood:                 116.33
No. Observations:                   4   AIC:                            -224.7
Df Residuals:                       0   BIC:                            -227.1
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                210.0000        inf          0        nan         nan         nan
blocks              -260.0000        inf         -0        nan         nan         nan
steals               -25.0000        inf         -0        nan         nan         nan
defensive rebounds  -115.0000        inf         -0        nan         nan         nan
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.157
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.485
Skew:                           0.682   Prob(JB):                        0.785
Kurtosis:                       1.976   Cond. No.                         345.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 9218.062499999956
R-squared: -841.7942857142817
No description has been provided for this image
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       0.869
Model:                            OLS   Adj. R-squared:                  0.476
Method:                 Least Squares   F-statistic:                     2.213
Date:                Fri, 17 May 2024   Prob (F-statistic):              0.450
Time:                        16:41:30   Log-Likelihood:                -9.4803
No. Observations:                   5   AIC:                             26.96
Df Residuals:                       1   BIC:                             25.40
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                  3.9383      2.129      1.850      0.316     -23.117      30.993
blocks                16.3273      7.800      2.093      0.284     -82.785     115.440
steals               -25.7250     11.643     -2.209      0.271    -173.668     122.218
defensive rebounds    -3.4665      1.833     -1.891      0.310     -26.761      19.828
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   2.189
Prob(Omnibus):                    nan   Jarque-Bera (JB):                1.000
Skew:                          -1.081   Prob(JB):                        0.607
Kurtosis:                       2.652   Cond. No.                         9.18
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 33.52740546056749
R-squared: -5.7265014090166755
No description has been provided for this image
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
  warn("omni_normtest is not valid with less than 8 observations; %i "
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   Rank   R-squared:                       0.925
Model:                            OLS   Adj. R-squared:                  0.701
Method:                 Least Squares   F-statistic:                     4.125
Date:                Fri, 17 May 2024   Prob (F-statistic):              0.344
Time:                        16:41:30   Log-Likelihood:                -6.4269
No. Observations:                   5   AIC:                             20.85
Df Residuals:                       1   BIC:                             19.29
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                  6.0057      2.466      2.435      0.248     -25.332      37.344
blocks                18.0569      7.869      2.295      0.262     -81.927     118.041
steals                 8.7817      4.485      1.958      0.301     -48.200      65.764
defensive rebounds    -3.6614      2.370     -1.545      0.366     -33.774      26.451
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.905
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.921
Skew:                           1.035   Prob(JB):                        0.631
Kurtosis:                       2.631   Cond. No.                         12.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 27.847719803603102
R-squared: -0.6392916454155892
No description has been provided for this image

In the case of running regressions on the defensive stats throughout the years, from interpreting the F-stats, p-values, along with the coefficients, it has been found that the correlations among the different criteria are statistically insignificant among all the years of the defensive years data.

Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2019-2020 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [96]:
def corrMDefensive1820(pureDefensive1820,mipdata1920):   

    pureDefensive1820 = pd.merge(pureDefensive1820, mipdata1920, left_index=True, right_index=True)
    correlation_matrix4 = pureDefensive1820.corr()
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2018-2020 Seasons for MIP')
    plt.show()
corrMDefensive1820(pureDefensive1820,mipdata1920) 
No description has been provided for this image

Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2020-2021 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [423]:
def corrMDefensive1921(pureDefensive1921,mipdata2021):   

    pureDefensive1921 = pd.merge(pureDefensive1921, mipdata2021, left_index=True, right_index=True)
    correlation_matrix4 = pureDefensive1921.corr()
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2019-2021 Seasons for MIP')
    plt.show()
corrMDefensive1921(pureDefensive1921,mipdata2021) 
No description has been provided for this image

Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2021-2022 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [421]:
def corrMDeffensive2022(pureDefensive2022,mipdata2122):   

    pureDefensive2022 = pd.merge(pureDefensive2022, mipdata2122, left_index=True, right_index=True)
    correlation_matrix4 = pureDefensive2022.corr()
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2020-2022 Seasons for MIP')
    plt.show()
corrMDeffensive2022(pureDefensive2022,mipdata2122) 
No description has been provided for this image

Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2022-2023 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [428]:
def corrMDefensive2123(pureDefensive2123,mipdata2223):   

    pureDefensive2123 = pd.merge(pureDefensive2123, mipdata2223, left_index=True, right_index=True)
    correlation_matrix4 = pureDefensive2123.corr()
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2021-2023 Seasons for MIP')
    plt.show()
corrMDefensive2123(pureDefensive2123,mipdata2223) 
No description has been provided for this image

Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2023-2024 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.

In [430]:
def corrMDefensive2224(pureDefensive2224,mipdata2324):   

    pureDefensive2224 = pd.merge(pureDefensive2224, mipdata2324, left_index=True, right_index=True)
    correlation_matrix4 = pureDefensive2224.corr()
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
    plt.title('Correlation Matrix for 2022-2024 Seasons for MIP') 
    plt.show()
corrMDefensive2224(pureDefensive2224,mipdata2324) 
No description has been provided for this image

CONCLUSION

Based on the sample size of the data and the results, the parameters selected were not statistically significant in determining the rank of a player to be in the running for the Most Improved Player(MIP) award. The project delves into the analysis of basketball player performance data with a specific emphasis on comparing the Most Improved Players (MIPs) over multiple NBA seasons to who could’ve won it based on statistics. Initially, the datasets containing offensive and defensive player statistics are cleaned and preprocessed to ensure data integrity. This involves extracting relevant metrics such as points, assists, rebounds, steals, and blocks, while handling missing values appropriately. Subsequently, the project computes the differences in player performance metrics between consecutive seasons to identify significant improvements or declines over time. Various regression analyses are conducted to explore the relationship between player statistics and rankings, allowing for predictions of player rankings based on their performance metrics. We employ visualization techniques such as bar charts and heatmaps to present the findings in a visually appealing and comprehensible manner. Overall, the project aims to provide valuable insights into player development trends and the factors influencing Most Improved Player (MIP) selections in the NBA. In future studies, insights could be made using all data from the inception of the nba to draw development correlations between players with regards to constraints on years stat checking began as well as constraints with processing power over large data sets.